Overview

Dataset statistics

Number of variables60
Number of observations28800
Missing cells189575
Missing cells (%)11.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory63.1 MiB
Average record size in memory2.2 KiB

Variable types

Numeric12
Categorical45
Boolean3

Alerts

efs is highly overall correlated with efs_timeHigh correlation
efs_time is highly overall correlated with efsHigh correlation
graft_type is highly overall correlated with prod_typeHigh correlation
gvhd_proph is highly overall correlated with hla_match_drb1_lowHigh correlation
hla_high_res_10 is highly overall correlated with hla_high_res_6 and 15 other fieldsHigh correlation
hla_high_res_6 is highly overall correlated with hla_high_res_10 and 12 other fieldsHigh correlation
hla_high_res_8 is highly overall correlated with hla_high_res_10 and 14 other fieldsHigh correlation
hla_low_res_10 is highly overall correlated with hla_high_res_10 and 15 other fieldsHigh correlation
hla_low_res_6 is highly overall correlated with hla_high_res_10 and 13 other fieldsHigh correlation
hla_low_res_8 is highly overall correlated with hla_high_res_10 and 14 other fieldsHigh correlation
hla_match_a_high is highly overall correlated with hla_high_res_10 and 8 other fieldsHigh correlation
hla_match_a_low is highly overall correlated with hla_high_res_10 and 8 other fieldsHigh correlation
hla_match_b_high is highly overall correlated with hla_high_res_10 and 11 other fieldsHigh correlation
hla_match_b_low is highly overall correlated with hla_high_res_10 and 11 other fieldsHigh correlation
hla_match_c_high is highly overall correlated with hla_high_res_10 and 11 other fieldsHigh correlation
hla_match_c_low is highly overall correlated with hla_high_res_10 and 10 other fieldsHigh correlation
hla_match_dqb1_high is highly overall correlated with hla_high_res_10 and 4 other fieldsHigh correlation
hla_match_dqb1_low is highly overall correlated with hla_high_res_10 and 2 other fieldsHigh correlation
hla_match_drb1_high is highly overall correlated with hla_high_res_10 and 10 other fieldsHigh correlation
hla_match_drb1_low is highly overall correlated with gvhd_proph and 16 other fieldsHigh correlation
hla_nmdp_6 is highly overall correlated with hla_high_res_10 and 12 other fieldsHigh correlation
prod_type is highly overall correlated with graft_typeHigh correlation
tce_div_match is highly overall correlated with tce_imm_matchHigh correlation
tce_imm_match is highly overall correlated with tce_div_matchHigh correlation
psych_disturb is highly imbalanced (61.1%)Imbalance
diabetes is highly imbalanced (56.7%)Imbalance
tbi_status is highly imbalanced (50.9%)Imbalance
arrhythmia is highly imbalanced (79.9%)Imbalance
vent_hist is highly imbalanced (81.2%)Imbalance
renal_issue is highly imbalanced (93.1%)Imbalance
pulm_severe is highly imbalanced (74.7%)Imbalance
tce_imm_match is highly imbalanced (57.2%)Imbalance
rituximab is highly imbalanced (84.1%)Imbalance
hla_match_dqb1_low is highly imbalanced (50.0%)Imbalance
ethnicity is highly imbalanced (60.5%)Imbalance
obesity is highly imbalanced (75.4%)Imbalance
hepatic_severe is highly imbalanced (76.5%)Imbalance
prior_tumor is highly imbalanced (63.1%)Imbalance
peptic_ulcer is highly imbalanced (91.5%)Imbalance
rheum_issue is highly imbalanced (89.0%)Imbalance
hepatic_mild is highly imbalanced (75.1%)Imbalance
cardiac is highly imbalanced (76.8%)Imbalance
pulm_moderate is highly imbalanced (51.6%)Imbalance
psych_disturb has 2062 (7.2%) missing valuesMissing
cyto_score has 8068 (28.0%) missing valuesMissing
diabetes has 2119 (7.4%) missing valuesMissing
hla_match_c_high has 4620 (16.0%) missing valuesMissing
hla_high_res_8 has 5829 (20.2%) missing valuesMissing
arrhythmia has 2202 (7.6%) missing valuesMissing
hla_low_res_6 has 3270 (11.4%) missing valuesMissing
renal_issue has 1915 (6.6%) missing valuesMissing
pulm_severe has 2135 (7.4%) missing valuesMissing
hla_high_res_6 has 5284 (18.3%) missing valuesMissing
cmv_status has 634 (2.2%) missing valuesMissing
hla_high_res_10 has 7163 (24.9%) missing valuesMissing
hla_match_dqb1_high has 5199 (18.1%) missing valuesMissing
tce_imm_match has 11133 (38.7%) missing valuesMissing
hla_nmdp_6 has 4197 (14.6%) missing valuesMissing
hla_match_c_low has 2800 (9.7%) missing valuesMissing
rituximab has 2148 (7.5%) missing valuesMissing
hla_match_drb1_low has 2643 (9.2%) missing valuesMissing
hla_match_dqb1_low has 4194 (14.6%) missing valuesMissing
cyto_score_detail has 11923 (41.4%) missing valuesMissing
conditioning_intensity has 4789 (16.6%) missing valuesMissing
ethnicity has 587 (2.0%) missing valuesMissing
obesity has 1760 (6.1%) missing valuesMissing
mrd_hct has 16597 (57.6%) missing valuesMissing
tce_match has 18996 (66.0%) missing valuesMissing
hla_match_a_high has 4301 (14.9%) missing valuesMissing
hepatic_severe has 1871 (6.5%) missing valuesMissing
donor_age has 1808 (6.3%) missing valuesMissing
prior_tumor has 1678 (5.8%) missing valuesMissing
hla_match_b_low has 2565 (8.9%) missing valuesMissing
peptic_ulcer has 2419 (8.4%) missing valuesMissing
hla_match_a_low has 2390 (8.3%) missing valuesMissing
rheum_issue has 2183 (7.6%) missing valuesMissing
hla_match_b_high has 4088 (14.2%) missing valuesMissing
comorbidity_score has 477 (1.7%) missing valuesMissing
karnofsky_score has 870 (3.0%) missing valuesMissing
hepatic_mild has 1917 (6.7%) missing valuesMissing
tce_div_match has 11396 (39.6%) missing valuesMissing
melphalan_dose has 1405 (4.9%) missing valuesMissing
hla_low_res_8 has 3653 (12.7%) missing valuesMissing
cardiac has 2542 (8.8%) missing valuesMissing
hla_match_drb1_high has 3352 (11.6%) missing valuesMissing
pulm_moderate has 2047 (7.1%) missing valuesMissing
hla_low_res_10 has 5064 (17.6%) missing valuesMissing
ID is uniformly distributedUniform
ID has unique valuesUnique
comorbidity_score has 10738 (37.3%) zerosZeros

Reproduction

Analysis started2024-12-17 11:00:41.486621
Analysis finished2024-12-17 11:01:12.949297
Duration31.46 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

ID
Real number (ℝ)

UNIFORM  UNIQUE 

Distinct28800
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14399.5
Minimum0
Maximum28799
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:13.030356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1439.95
Q17199.75
median14399.5
Q321599.25
95-th percentile27359.05
Maximum28799
Range28799
Interquartile range (IQR)14399.5

Descriptive statistics

Standard deviation8313.9882
Coefficient of variation (CV)0.57738034
Kurtosis-1.2
Mean14399.5
Median Absolute Deviation (MAD)7200
Skewness0
Sum4.147056 × 108
Variance69122400
MonotonicityStrictly increasing
2024-12-17T12:01:13.188098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1
 
< 0.1%
19237 1
 
< 0.1%
19207 1
 
< 0.1%
19206 1
 
< 0.1%
19205 1
 
< 0.1%
19204 1
 
< 0.1%
19203 1
 
< 0.1%
19202 1
 
< 0.1%
19201 1
 
< 0.1%
19200 1
 
< 0.1%
Other values (28790) 28790
> 99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
28799 1
< 0.1%
28798 1
< 0.1%
28797 1
< 0.1%
28796 1
< 0.1%
28795 1
< 0.1%
28794 1
< 0.1%
28793 1
< 0.1%
28792 1
< 0.1%
28791 1
< 0.1%
28790 1
< 0.1%

dri_score
Categorical

Distinct11
Distinct (%)< 0.1%
Missing154
Missing (%)0.5%
Memory size2.0 MiB
Intermediate
10436 
N/A - pediatric
4779 
High
4701 
N/A - non-malignant indication
2427 
TBD cytogenetics
2003 
Other values (6)
4300 

Length

Max length49
Median length41
Mean length14.593311
Min length3

Characters and Unicode

Total characters418040
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN/A - non-malignant indication
2nd rowIntermediate
3rd rowN/A - non-malignant indication
4th rowHigh
5th rowHigh

Common Values

ValueCountFrequency (%)
Intermediate 10436
36.2%
N/A - pediatric 4779
16.6%
High 4701
16.3%
N/A - non-malignant indication 2427
 
8.4%
TBD cytogenetics 2003
 
7.0%
Low 1926
 
6.7%
High - TED AML case <missing cytogenetics 1414
 
4.9%
Intermediate - TED AML case <missing cytogenetics 481
 
1.7%
N/A - disease not classifiable 272
 
0.9%
Very high 198
 
0.7%
(Missing) 154
 
0.5%

Length

2024-12-17T12:01:13.320869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
intermediate 10917
18.1%
9373
15.6%
n/a 7478
12.4%
high 6313
10.5%
pediatric 4779
7.9%
cytogenetics 3898
 
6.5%
non-malignant 2427
 
4.0%
indication 2427
 
4.0%
tbd 2003
 
3.3%
low 1926
 
3.2%
Other values (9) 8621
14.3%

Most occurring characters

ValueCountFrequency (%)
e 48253
 
11.5%
i 45027
 
10.8%
t 39553
 
9.5%
n 31553
 
7.5%
31516
 
7.5%
a 25706
 
6.1%
d 18404
 
4.4%
c 17169
 
4.1%
r 15894
 
3.8%
m 15239
 
3.6%
Other values (25) 129726
31.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 418040
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 48253
 
11.5%
i 45027
 
10.8%
t 39553
 
9.5%
n 31553
 
7.5%
31516
 
7.5%
a 25706
 
6.1%
d 18404
 
4.4%
c 17169
 
4.1%
r 15894
 
3.8%
m 15239
 
3.6%
Other values (25) 129726
31.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 418040
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 48253
 
11.5%
i 45027
 
10.8%
t 39553
 
9.5%
n 31553
 
7.5%
31516
 
7.5%
a 25706
 
6.1%
d 18404
 
4.4%
c 17169
 
4.1%
r 15894
 
3.8%
m 15239
 
3.6%
Other values (25) 129726
31.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 418040
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 48253
 
11.5%
i 45027
 
10.8%
t 39553
 
9.5%
n 31553
 
7.5%
31516
 
7.5%
a 25706
 
6.1%
d 18404
 
4.4%
c 17169
 
4.1%
r 15894
 
3.8%
m 15239
 
3.6%
Other values (25) 129726
31.0%

psych_disturb
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2062
Missing (%)7.2%
Memory size1.6 MiB
No
23005 
Yes
3587 
Not done
 
146

Length

Max length8
Median length2
Mean length2.166916
Min length2

Characters and Unicode

Total characters57939
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 23005
79.9%
Yes 3587
 
12.5%
Not done 146
 
0.5%
(Missing) 2062
 
7.2%

Length

2024-12-17T12:01:13.435655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:13.550615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 23005
85.6%
yes 3587
 
13.3%
not 146
 
0.5%
done 146
 
0.5%

Most occurring characters

ValueCountFrequency (%)
o 23297
40.2%
N 23151
40.0%
e 3733
 
6.4%
Y 3587
 
6.2%
s 3587
 
6.2%
t 146
 
0.3%
146
 
0.3%
d 146
 
0.3%
n 146
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 57939
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 23297
40.2%
N 23151
40.0%
e 3733
 
6.4%
Y 3587
 
6.2%
s 3587
 
6.2%
t 146
 
0.3%
146
 
0.3%
d 146
 
0.3%
n 146
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 57939
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 23297
40.2%
N 23151
40.0%
e 3733
 
6.4%
Y 3587
 
6.2%
s 3587
 
6.2%
t 146
 
0.3%
146
 
0.3%
d 146
 
0.3%
n 146
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 57939
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 23297
40.2%
N 23151
40.0%
e 3733
 
6.4%
Y 3587
 
6.2%
s 3587
 
6.2%
t 146
 
0.3%
146
 
0.3%
d 146
 
0.3%
n 146
 
0.3%

cyto_score
Categorical

MISSING 

Distinct7
Distinct (%)< 0.1%
Missing8068
Missing (%)28.0%
Memory size1.6 MiB
Poor
8802 
Intermediate
6376 
Favorable
3011 
TBD
1341 
Normal
 
643
Other values (2)
 
559

Length

Max length12
Median length10
Mean length7.224098
Min length3

Characters and Unicode

Total characters149770
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIntermediate
2nd rowIntermediate
3rd rowPoor
4th rowPoor
5th rowOther

Common Values

ValueCountFrequency (%)
Poor 8802
30.6%
Intermediate 6376
22.1%
Favorable 3011
 
10.5%
TBD 1341
 
4.7%
Normal 643
 
2.2%
Other 504
 
1.8%
Not tested 55
 
0.2%
(Missing) 8068
28.0%

Length

2024-12-17T12:01:13.649682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:13.758796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
poor 8802
42.3%
intermediate 6376
30.7%
favorable 3011
 
14.5%
tbd 1341
 
6.5%
normal 643
 
3.1%
other 504
 
2.4%
not 55
 
0.3%
tested 55
 
0.3%

Most occurring characters

ValueCountFrequency (%)
e 22753
15.2%
o 21313
14.2%
r 19336
12.9%
t 13421
9.0%
a 13041
8.7%
P 8802
 
5.9%
m 7019
 
4.7%
d 6431
 
4.3%
I 6376
 
4.3%
n 6376
 
4.3%
Other values (13) 24902
16.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 149770
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 22753
15.2%
o 21313
14.2%
r 19336
12.9%
t 13421
9.0%
a 13041
8.7%
P 8802
 
5.9%
m 7019
 
4.7%
d 6431
 
4.3%
I 6376
 
4.3%
n 6376
 
4.3%
Other values (13) 24902
16.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 149770
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 22753
15.2%
o 21313
14.2%
r 19336
12.9%
t 13421
9.0%
a 13041
8.7%
P 8802
 
5.9%
m 7019
 
4.7%
d 6431
 
4.3%
I 6376
 
4.3%
n 6376
 
4.3%
Other values (13) 24902
16.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 149770
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 22753
15.2%
o 21313
14.2%
r 19336
12.9%
t 13421
9.0%
a 13041
8.7%
P 8802
 
5.9%
m 7019
 
4.7%
d 6431
 
4.3%
I 6376
 
4.3%
n 6376
 
4.3%
Other values (13) 24902
16.6%

diabetes
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2119
Missing (%)7.4%
Memory size1.6 MiB
No
22201 
Yes
4339 
Not done
 
141

Length

Max length8
Median length2
Mean length2.194333
Min length2

Characters and Unicode

Total characters58547
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 22201
77.1%
Yes 4339
 
15.1%
Not done 141
 
0.5%
(Missing) 2119
 
7.4%

Length

2024-12-17T12:01:13.876984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:13.962203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 22201
82.8%
yes 4339
 
16.2%
not 141
 
0.5%
done 141
 
0.5%

Most occurring characters

ValueCountFrequency (%)
o 22483
38.4%
N 22342
38.2%
e 4480
 
7.7%
Y 4339
 
7.4%
s 4339
 
7.4%
t 141
 
0.2%
141
 
0.2%
d 141
 
0.2%
n 141
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 58547
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 22483
38.4%
N 22342
38.2%
e 4480
 
7.7%
Y 4339
 
7.4%
s 4339
 
7.4%
t 141
 
0.2%
141
 
0.2%
d 141
 
0.2%
n 141
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 58547
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 22483
38.4%
N 22342
38.2%
e 4480
 
7.7%
Y 4339
 
7.4%
s 4339
 
7.4%
t 141
 
0.2%
141
 
0.2%
d 141
 
0.2%
n 141
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 58547
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 22483
38.4%
N 22342
38.2%
e 4480
 
7.7%
Y 4339
 
7.4%
s 4339
 
7.4%
t 141
 
0.2%
141
 
0.2%
d 141
 
0.2%
n 141
 
0.2%

hla_match_c_high
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing4620
Missing (%)16.0%
Memory size1.6 MiB
2.0
18565 
1.0
5536 
0.0
 
79

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters72540
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 18565
64.5%
1.0 5536
 
19.2%
0.0 79
 
0.3%
(Missing) 4620
 
16.0%

Length

2024-12-17T12:01:14.047718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:14.125785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 18565
76.8%
1.0 5536
 
22.9%
0.0 79
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0 24259
33.4%
. 24180
33.3%
2 18565
25.6%
1 5536
 
7.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 72540
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 24259
33.4%
. 24180
33.3%
2 18565
25.6%
1 5536
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 72540
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 24259
33.4%
. 24180
33.3%
2 18565
25.6%
1 5536
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 72540
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 24259
33.4%
. 24180
33.3%
2 18565
25.6%
1 5536
 
7.6%

hla_high_res_8
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct7
Distinct (%)< 0.1%
Missing5829
Missing (%)20.2%
Infinite0
Infinite (%)0.0%
Mean6.8768012
Minimum2
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:14.200167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q16
median8
Q38
95-th percentile8
Maximum8
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.5643134
Coefficient of variation (CV)0.2274769
Kurtosis-0.73124988
Mean6.8768012
Median Absolute Deviation (MAD)0
Skewness-0.97246278
Sum157967
Variance2.4470765
MonotonicityNot monotonic
2024-12-17T12:01:14.289452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
8 13568
47.1%
4 3820
 
13.3%
7 2385
 
8.3%
5 1648
 
5.7%
6 1520
 
5.3%
3 28
 
0.1%
2 2
 
< 0.1%
(Missing) 5829
20.2%
ValueCountFrequency (%)
2 2
 
< 0.1%
3 28
 
0.1%
4 3820
 
13.3%
5 1648
 
5.7%
6 1520
 
5.3%
7 2385
 
8.3%
8 13568
47.1%
ValueCountFrequency (%)
8 13568
47.1%
7 2385
 
8.3%
6 1520
 
5.3%
5 1648
 
5.7%
4 3820
 
13.3%
3 28
 
0.1%
2 2
 
< 0.1%

tbi_status
Categorical

IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.8 MiB
No TBI
18861 
TBI + Cy +- Other
6104 
TBI +- Other, <=cGy
 
1727
TBI +- Other, >cGy
 
1700
TBI +- Other, -cGy, single
 
134
Other values (3)
 
274

Length

Max length32
Median length6
Mean length10.143854
Min length6

Characters and Unicode

Total characters292143
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo TBI
2nd rowTBI +- Other, >cGy
3rd rowNo TBI
4th rowNo TBI
5th rowNo TBI

Common Values

ValueCountFrequency (%)
No TBI 18861
65.5%
TBI + Cy +- Other 6104
 
21.2%
TBI +- Other, <=cGy 1727
 
6.0%
TBI +- Other, >cGy 1700
 
5.9%
TBI +- Other, -cGy, single 134
 
0.5%
TBI +- Other, -cGy, fractionated 119
 
0.4%
TBI +- Other, -cGy, unknown dose 79
 
0.3%
TBI +- Other, unknown dose 76
 
0.3%

Length

2024-12-17T12:01:14.397597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:14.495255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
tbi 28800
34.3%
no 18861
22.4%
16043
19.1%
other 9939
 
11.8%
cy 6104
 
7.3%
cgy 3759
 
4.5%
unknown 155
 
0.2%
dose 155
 
0.2%
single 134
 
0.2%
fractionated 119
 
0.1%

Most occurring characters

ValueCountFrequency (%)
55269
18.9%
T 28800
9.9%
B 28800
9.9%
I 28800
9.9%
o 19290
 
6.6%
N 18861
 
6.5%
+ 16043
 
5.5%
e 10347
 
3.5%
- 10271
 
3.5%
t 10177
 
3.5%
Other values (22) 65485
22.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 292143
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
55269
18.9%
T 28800
9.9%
B 28800
9.9%
I 28800
9.9%
o 19290
 
6.6%
N 18861
 
6.5%
+ 16043
 
5.5%
e 10347
 
3.5%
- 10271
 
3.5%
t 10177
 
3.5%
Other values (22) 65485
22.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 292143
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
55269
18.9%
T 28800
9.9%
B 28800
9.9%
I 28800
9.9%
o 19290
 
6.6%
N 18861
 
6.5%
+ 16043
 
5.5%
e 10347
 
3.5%
- 10271
 
3.5%
t 10177
 
3.5%
Other values (22) 65485
22.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 292143
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
55269
18.9%
T 28800
9.9%
B 28800
9.9%
I 28800
9.9%
o 19290
 
6.6%
N 18861
 
6.5%
+ 16043
 
5.5%
e 10347
 
3.5%
- 10271
 
3.5%
t 10177
 
3.5%
Other values (22) 65485
22.4%

arrhythmia
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2202
Missing (%)7.6%
Memory size1.6 MiB
No
25203 
Yes
 
1277
Not done
 
118

Length

Max length8
Median length2
Mean length2.0746297
Min length2

Characters and Unicode

Total characters55181
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 25203
87.5%
Yes 1277
 
4.4%
Not done 118
 
0.4%
(Missing) 2202
 
7.6%

Length

2024-12-17T12:01:14.613890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:14.697674image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 25203
94.3%
yes 1277
 
4.8%
not 118
 
0.4%
done 118
 
0.4%

Most occurring characters

ValueCountFrequency (%)
o 25439
46.1%
N 25321
45.9%
e 1395
 
2.5%
Y 1277
 
2.3%
s 1277
 
2.3%
t 118
 
0.2%
118
 
0.2%
d 118
 
0.2%
n 118
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 55181
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 25439
46.1%
N 25321
45.9%
e 1395
 
2.5%
Y 1277
 
2.3%
s 1277
 
2.3%
t 118
 
0.2%
118
 
0.2%
d 118
 
0.2%
n 118
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 55181
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 25439
46.1%
N 25321
45.9%
e 1395
 
2.5%
Y 1277
 
2.3%
s 1277
 
2.3%
t 118
 
0.2%
118
 
0.2%
d 118
 
0.2%
n 118
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 55181
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 25439
46.1%
N 25321
45.9%
e 1395
 
2.5%
Y 1277
 
2.3%
s 1277
 
2.3%
t 118
 
0.2%
118
 
0.2%
d 118
 
0.2%
n 118
 
0.2%

hla_low_res_6
Categorical

HIGH CORRELATION  MISSING 

Distinct5
Distinct (%)< 0.1%
Missing3270
Missing (%)11.4%
Memory size1.6 MiB
6.0
15690 
3.0
4955 
5.0
2808 
4.0
2055 
2.0
 
22

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters76590
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6.0
2nd row6.0
3rd row6.0
4th row6.0
5th row6.0

Common Values

ValueCountFrequency (%)
6.0 15690
54.5%
3.0 4955
 
17.2%
5.0 2808
 
9.8%
4.0 2055
 
7.1%
2.0 22
 
0.1%
(Missing) 3270
 
11.4%

Length

2024-12-17T12:01:14.785492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:14.871362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6.0 15690
61.5%
3.0 4955
 
19.4%
5.0 2808
 
11.0%
4.0 2055
 
8.0%
2.0 22
 
0.1%

Most occurring characters

ValueCountFrequency (%)
. 25530
33.3%
0 25530
33.3%
6 15690
20.5%
3 4955
 
6.5%
5 2808
 
3.7%
4 2055
 
2.7%
2 22
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 76590
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
. 25530
33.3%
0 25530
33.3%
6 15690
20.5%
3 4955
 
6.5%
5 2808
 
3.7%
4 2055
 
2.7%
2 22
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 76590
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
. 25530
33.3%
0 25530
33.3%
6 15690
20.5%
3 4955
 
6.5%
5 2808
 
3.7%
4 2055
 
2.7%
2 22
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 76590
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
. 25530
33.3%
0 25530
33.3%
6 15690
20.5%
3 4955
 
6.5%
5 2808
 
3.7%
4 2055
 
2.7%
2 22
 
< 0.1%

graft_type
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
Peripheral blood
20546 
Bone marrow
8254 

Length

Max length16
Median length16
Mean length14.567014
Min length11

Characters and Unicode

Total characters419530
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBone marrow
2nd rowPeripheral blood
3rd rowBone marrow
4th rowBone marrow
5th rowPeripheral blood

Common Values

ValueCountFrequency (%)
Peripheral blood 20546
71.3%
Bone marrow 8254
28.7%

Length

2024-12-17T12:01:14.970788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:15.050778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
peripheral 20546
35.7%
blood 20546
35.7%
bone 8254
14.3%
marrow 8254
14.3%

Most occurring characters

ValueCountFrequency (%)
r 57600
13.7%
o 57600
13.7%
e 49346
11.8%
l 41092
9.8%
a 28800
 
6.9%
28800
 
6.9%
P 20546
 
4.9%
i 20546
 
4.9%
p 20546
 
4.9%
h 20546
 
4.9%
Other values (6) 74108
17.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 419530
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 57600
13.7%
o 57600
13.7%
e 49346
11.8%
l 41092
9.8%
a 28800
 
6.9%
28800
 
6.9%
P 20546
 
4.9%
i 20546
 
4.9%
p 20546
 
4.9%
h 20546
 
4.9%
Other values (6) 74108
17.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 419530
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 57600
13.7%
o 57600
13.7%
e 49346
11.8%
l 41092
9.8%
a 28800
 
6.9%
28800
 
6.9%
P 20546
 
4.9%
i 20546
 
4.9%
p 20546
 
4.9%
h 20546
 
4.9%
Other values (6) 74108
17.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 419530
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 57600
13.7%
o 57600
13.7%
e 49346
11.8%
l 41092
9.8%
a 28800
 
6.9%
28800
 
6.9%
P 20546
 
4.9%
i 20546
 
4.9%
p 20546
 
4.9%
h 20546
 
4.9%
Other values (6) 74108
17.7%

vent_hist
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing259
Missing (%)0.9%
Memory size56.4 KiB
False
27721 
True
 
820
(Missing)
 
259
ValueCountFrequency (%)
False 27721
96.3%
True 820
 
2.8%
(Missing) 259
 
0.9%
2024-12-17T12:01:15.122140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

renal_issue
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing1915
Missing (%)6.6%
Memory size1.6 MiB
No
26548 
Yes
 
200
Not done
 
137

Length

Max length8
Median length2
Mean length2.0380138
Min length2

Characters and Unicode

Total characters54792
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 26548
92.2%
Yes 200
 
0.7%
Not done 137
 
0.5%
(Missing) 1915
 
6.6%

Length

2024-12-17T12:01:15.312697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:15.396421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 26548
98.2%
yes 200
 
0.7%
not 137
 
0.5%
done 137
 
0.5%

Most occurring characters

ValueCountFrequency (%)
o 26822
49.0%
N 26685
48.7%
e 337
 
0.6%
Y 200
 
0.4%
s 200
 
0.4%
t 137
 
0.3%
137
 
0.3%
d 137
 
0.3%
n 137
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 54792
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 26822
49.0%
N 26685
48.7%
e 337
 
0.6%
Y 200
 
0.4%
s 200
 
0.4%
t 137
 
0.3%
137
 
0.3%
d 137
 
0.3%
n 137
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 54792
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 26822
49.0%
N 26685
48.7%
e 337
 
0.6%
Y 200
 
0.4%
s 200
 
0.4%
t 137
 
0.3%
137
 
0.3%
d 137
 
0.3%
n 137
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 54792
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 26822
49.0%
N 26685
48.7%
e 337
 
0.6%
Y 200
 
0.4%
s 200
 
0.4%
t 137
 
0.3%
137
 
0.3%
d 137
 
0.3%
n 137
 
0.3%

pulm_severe
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2135
Missing (%)7.4%
Memory size1.6 MiB
No
24779 
Yes
 
1706
Not done
 
180

Length

Max length8
Median length2
Mean length2.1044815
Min length2

Characters and Unicode

Total characters56116
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 24779
86.0%
Yes 1706
 
5.9%
Not done 180
 
0.6%
(Missing) 2135
 
7.4%

Length

2024-12-17T12:01:15.489688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:15.573479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 24779
92.3%
yes 1706
 
6.4%
not 180
 
0.7%
done 180
 
0.7%

Most occurring characters

ValueCountFrequency (%)
o 25139
44.8%
N 24959
44.5%
e 1886
 
3.4%
Y 1706
 
3.0%
s 1706
 
3.0%
t 180
 
0.3%
180
 
0.3%
d 180
 
0.3%
n 180
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 56116
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 25139
44.8%
N 24959
44.5%
e 1886
 
3.4%
Y 1706
 
3.0%
s 1706
 
3.0%
t 180
 
0.3%
180
 
0.3%
d 180
 
0.3%
n 180
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 56116
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 25139
44.8%
N 24959
44.5%
e 1886
 
3.4%
Y 1706
 
3.0%
s 1706
 
3.0%
t 180
 
0.3%
180
 
0.3%
d 180
 
0.3%
n 180
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 56116
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 25139
44.8%
N 24959
44.5%
e 1886
 
3.4%
Y 1706
 
3.0%
s 1706
 
3.0%
t 180
 
0.3%
180
 
0.3%
d 180
 
0.3%
n 180
 
0.3%

prim_disease_hct
Categorical

Distinct18
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
ALL
8102 
AML
7135 
MDS
3046 
IPA
1719 
MPN
1656 
Other values (13)
7142 

Length

Max length20
Median length3
Mean length3.2288194
Min length2

Characters and Unicode

Total characters92990
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIEA
2nd rowAML
3rd rowHIS
4th rowALL
5th rowMPN

Common Values

ValueCountFrequency (%)
ALL 8102
28.1%
AML 7135
24.8%
MDS 3046
 
10.6%
IPA 1719
 
6.0%
MPN 1656
 
5.8%
IEA 1449
 
5.0%
NHL 1319
 
4.6%
IIS 1024
 
3.6%
PCD 869
 
3.0%
SAA 713
 
2.5%
Other values (8) 1768
 
6.1%

Length

2024-12-17T12:01:15.669927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
all 8102
27.4%
aml 7135
24.2%
mds 3046
 
10.3%
ipa 1719
 
5.8%
mpn 1656
 
5.6%
iea 1449
 
4.9%
nhl 1319
 
4.5%
iis 1024
 
3.5%
pcd 869
 
2.9%
saa 713
 
2.4%
Other values (10) 2507
 
8.5%

Most occurring characters

ValueCountFrequency (%)
L 24678
26.5%
A 20280
21.8%
M 12001
12.9%
I 6254
 
6.7%
S 5435
 
5.8%
P 4244
 
4.6%
D 4113
 
4.4%
N 2975
 
3.2%
H 1818
 
2.0%
E 1449
 
1.6%
Other values (16) 9743
 
10.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 92990
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
L 24678
26.5%
A 20280
21.8%
M 12001
12.9%
I 6254
 
6.7%
S 5435
 
5.8%
P 4244
 
4.6%
D 4113
 
4.4%
N 2975
 
3.2%
H 1818
 
2.0%
E 1449
 
1.6%
Other values (16) 9743
 
10.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 92990
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
L 24678
26.5%
A 20280
21.8%
M 12001
12.9%
I 6254
 
6.7%
S 5435
 
5.8%
P 4244
 
4.6%
D 4113
 
4.4%
N 2975
 
3.2%
H 1818
 
2.0%
E 1449
 
1.6%
Other values (16) 9743
 
10.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 92990
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
L 24678
26.5%
A 20280
21.8%
M 12001
12.9%
I 6254
 
6.7%
S 5435
 
5.8%
P 4244
 
4.6%
D 4113
 
4.4%
N 2975
 
3.2%
H 1818
 
2.0%
E 1449
 
1.6%
Other values (16) 9743
 
10.5%

hla_high_res_6
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct6
Distinct (%)< 0.1%
Missing5284
Missing (%)18.3%
Infinite0
Infinite (%)0.0%
Mean5.1092022
Minimum0
Maximum6
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:15.758101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q14
median6
Q36
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.2141621
Coefficient of variation (CV)0.23764221
Kurtosis-0.89214096
Mean5.1092022
Median Absolute Deviation (MAD)0
Skewness-0.89215563
Sum120148
Variance1.4741896
MonotonicityNot monotonic
2024-12-17T12:01:15.844800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
6 14022
48.7%
3 4596
 
16.0%
5 2726
 
9.5%
4 2128
 
7.4%
2 43
 
0.1%
0 1
 
< 0.1%
(Missing) 5284
 
18.3%
ValueCountFrequency (%)
0 1
 
< 0.1%
2 43
 
0.1%
3 4596
 
16.0%
4 2128
 
7.4%
5 2726
 
9.5%
6 14022
48.7%
ValueCountFrequency (%)
6 14022
48.7%
5 2726
 
9.5%
4 2128
 
7.4%
3 4596
 
16.0%
2 43
 
0.1%
0 1
 
< 0.1%

cmv_status
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing634
Missing (%)2.2%
Memory size1.6 MiB
+/+
13596 
-/+
7081 
+/-
4048 
-/-
3441 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters84498
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row+/+
2nd row+/+
3rd row+/+
4th row+/+
5th row+/+

Common Values

ValueCountFrequency (%)
+/+ 13596
47.2%
-/+ 7081
24.6%
+/- 4048
 
14.1%
-/- 3441
 
11.9%
(Missing) 634
 
2.2%

Length

2024-12-17T12:01:15.941687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:16.024379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
28166
100.0%

Most occurring characters

ValueCountFrequency (%)
+ 38321
45.4%
/ 28166
33.3%
- 18011
21.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 84498
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
+ 38321
45.4%
/ 28166
33.3%
- 18011
21.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 84498
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
+ 38321
45.4%
/ 28166
33.3%
- 18011
21.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 84498
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
+ 38321
45.4%
/ 28166
33.3%
- 18011
21.3%

hla_high_res_10
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct8
Distinct (%)< 0.1%
Missing7163
Missing (%)24.9%
Infinite0
Infinite (%)0.0%
Mean8.6172297
Minimum3
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:16.103576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile5
Q17
median10
Q310
95-th percentile10
Maximum10
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.9051251
Coefficient of variation (CV)0.22108324
Kurtosis-0.64856223
Mean8.6172297
Median Absolute Deviation (MAD)0
Skewness-0.99847927
Sum186451
Variance3.6295016
MonotonicityNot monotonic
2024-12-17T12:01:16.194122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
10 12232
42.5%
5 3161
 
11.0%
9 2369
 
8.2%
6 1355
 
4.7%
8 1314
 
4.6%
7 1180
 
4.1%
4 25
 
0.1%
3 1
 
< 0.1%
(Missing) 7163
24.9%
ValueCountFrequency (%)
3 1
 
< 0.1%
4 25
 
0.1%
5 3161
 
11.0%
6 1355
 
4.7%
7 1180
 
4.1%
8 1314
 
4.6%
9 2369
 
8.2%
10 12232
42.5%
ValueCountFrequency (%)
10 12232
42.5%
9 2369
 
8.2%
8 1314
 
4.6%
7 1180
 
4.1%
6 1355
 
4.7%
5 3161
 
11.0%
4 25
 
0.1%
3 1
 
< 0.1%

hla_match_dqb1_high
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing5199
Missing (%)18.1%
Memory size1.5 MiB
2.0
17468 
1.0
6056 
0.0
 
77

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters70803
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 17468
60.7%
1.0 6056
 
21.0%
0.0 77
 
0.3%
(Missing) 5199
 
18.1%

Length

2024-12-17T12:01:16.291376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:16.366890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 17468
74.0%
1.0 6056
 
25.7%
0.0 77
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0 23678
33.4%
. 23601
33.3%
2 17468
24.7%
1 6056
 
8.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 70803
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 23678
33.4%
. 23601
33.3%
2 17468
24.7%
1 6056
 
8.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 70803
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 23678
33.4%
. 23601
33.3%
2 17468
24.7%
1 6056
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 70803
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 23678
33.4%
. 23601
33.3%
2 17468
24.7%
1 6056
 
8.6%

tce_imm_match
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct8
Distinct (%)< 0.1%
Missing11133
Missing (%)38.7%
Memory size1.4 MiB
P/P
13114 
G/G
2522 
H/H
 
1084
G/B
 
544
H/B
 
229
Other values (3)
 
174

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters53001
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP/P
2nd rowP/P
3rd rowP/P
4th rowP/P
5th rowP/P

Common Values

ValueCountFrequency (%)
P/P 13114
45.5%
G/G 2522
 
8.8%
H/H 1084
 
3.8%
G/B 544
 
1.9%
H/B 229
 
0.8%
P/H 83
 
0.3%
P/B 66
 
0.2%
P/G 25
 
0.1%
(Missing) 11133
38.7%

Length

2024-12-17T12:01:16.453755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:16.550168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
p/p 13114
74.2%
g/g 2522
 
14.3%
h/h 1084
 
6.1%
g/b 544
 
3.1%
h/b 229
 
1.3%
p/h 83
 
0.5%
p/b 66
 
0.4%
p/g 25
 
0.1%

Most occurring characters

ValueCountFrequency (%)
P 26402
49.8%
/ 17667
33.3%
G 5613
 
10.6%
H 2480
 
4.7%
B 839
 
1.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 53001
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
P 26402
49.8%
/ 17667
33.3%
G 5613
 
10.6%
H 2480
 
4.7%
B 839
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 53001
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
P 26402
49.8%
/ 17667
33.3%
G 5613
 
10.6%
H 2480
 
4.7%
B 839
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 53001
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
P 26402
49.8%
/ 17667
33.3%
G 5613
 
10.6%
H 2480
 
4.7%
B 839
 
1.6%

hla_nmdp_6
Categorical

HIGH CORRELATION  MISSING 

Distinct5
Distinct (%)< 0.1%
Missing4197
Missing (%)14.6%
Memory size1.6 MiB
6.0
15105 
3.0
4888 
5.0
3296 
4.0
 
1279
2.0
 
35

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters73809
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6.0
2nd row6.0
3rd row6.0
4th row6.0
5th row5.0

Common Values

ValueCountFrequency (%)
6.0 15105
52.4%
3.0 4888
 
17.0%
5.0 3296
 
11.4%
4.0 1279
 
4.4%
2.0 35
 
0.1%
(Missing) 4197
 
14.6%

Length

2024-12-17T12:01:16.653340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:16.733796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6.0 15105
61.4%
3.0 4888
 
19.9%
5.0 3296
 
13.4%
4.0 1279
 
5.2%
2.0 35
 
0.1%

Most occurring characters

ValueCountFrequency (%)
. 24603
33.3%
0 24603
33.3%
6 15105
20.5%
3 4888
 
6.6%
5 3296
 
4.5%
4 1279
 
1.7%
2 35
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 73809
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
. 24603
33.3%
0 24603
33.3%
6 15105
20.5%
3 4888
 
6.6%
5 3296
 
4.5%
4 1279
 
1.7%
2 35
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 73809
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
. 24603
33.3%
0 24603
33.3%
6 15105
20.5%
3 4888
 
6.6%
5 3296
 
4.5%
4 1279
 
1.7%
2 35
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 73809
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
. 24603
33.3%
0 24603
33.3%
6 15105
20.5%
3 4888
 
6.6%
5 3296
 
4.5%
4 1279
 
1.7%
2 35
 
< 0.1%

hla_match_c_low
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2800
Missing (%)9.7%
Memory size1.6 MiB
2.0
19782 
1.0
6139 
0.0
 
79

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters78000
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 19782
68.7%
1.0 6139
 
21.3%
0.0 79
 
0.3%
(Missing) 2800
 
9.7%

Length

2024-12-17T12:01:16.827020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:16.948461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 19782
76.1%
1.0 6139
 
23.6%
0.0 79
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0 26079
33.4%
. 26000
33.3%
2 19782
25.4%
1 6139
 
7.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 78000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 26079
33.4%
. 26000
33.3%
2 19782
25.4%
1 6139
 
7.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 78000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 26079
33.4%
. 26000
33.3%
2 19782
25.4%
1 6139
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 78000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 26079
33.4%
. 26000
33.3%
2 19782
25.4%
1 6139
 
7.9%

rituximab
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing2148
Missing (%)7.5%
Memory size56.4 KiB
False
26033 
True
 
619
(Missing)
 
2148
ValueCountFrequency (%)
False 26033
90.4%
True 619
 
2.1%
(Missing) 2148
 
7.5%
2024-12-17T12:01:17.028476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

hla_match_drb1_low
Categorical

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing2643
Missing (%)9.2%
Memory size1.6 MiB
2.0
18710 
1.0
7447 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters78471
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 18710
65.0%
1.0 7447
 
25.9%
(Missing) 2643
 
9.2%

Length

2024-12-17T12:01:17.124036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:17.208525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 18710
71.5%
1.0 7447
 
28.5%

Most occurring characters

ValueCountFrequency (%)
. 26157
33.3%
0 26157
33.3%
2 18710
23.8%
1 7447
 
9.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 78471
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
. 26157
33.3%
0 26157
33.3%
2 18710
23.8%
1 7447
 
9.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 78471
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
. 26157
33.3%
0 26157
33.3%
2 18710
23.8%
1 7447
 
9.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 78471
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
. 26157
33.3%
0 26157
33.3%
2 18710
23.8%
1 7447
 
9.5%

hla_match_dqb1_low
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing4194
Missing (%)14.6%
Memory size1.6 MiB
2.0
19131 
1.0
5384 
0.0
 
91

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters73818
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 19131
66.4%
1.0 5384
 
18.7%
0.0 91
 
0.3%
(Missing) 4194
 
14.6%

Length

2024-12-17T12:01:17.299543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:17.385304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 19131
77.7%
1.0 5384
 
21.9%
0.0 91
 
0.4%

Most occurring characters

ValueCountFrequency (%)
0 24697
33.5%
. 24606
33.3%
2 19131
25.9%
1 5384
 
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 73818
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 24697
33.5%
. 24606
33.3%
2 19131
25.9%
1 5384
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 73818
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 24697
33.5%
. 24606
33.3%
2 19131
25.9%
1 5384
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 73818
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 24697
33.5%
. 24606
33.3%
2 19131
25.9%
1 5384
 
7.3%

prod_type
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
PB
20381 
BM
8419 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters57600
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBM
2nd rowPB
3rd rowBM
4th rowBM
5th rowPB

Common Values

ValueCountFrequency (%)
PB 20381
70.8%
BM 8419
29.2%

Length

2024-12-17T12:01:17.479134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:17.560960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
pb 20381
70.8%
bm 8419
29.2%

Most occurring characters

ValueCountFrequency (%)
B 28800
50.0%
P 20381
35.4%
M 8419
 
14.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 57600
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
B 28800
50.0%
P 20381
35.4%
M 8419
 
14.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 57600
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
B 28800
50.0%
P 20381
35.4%
M 8419
 
14.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 57600
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
B 28800
50.0%
P 20381
35.4%
M 8419
 
14.6%

cyto_score_detail
Categorical

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing11923
Missing (%)41.4%
Memory size1.5 MiB
Intermediate
11158 
Poor
3323 
Favorable
1208 
TBD
 
1043
Not tested
 
145

Length

Max length12
Median length12
Mean length9.6367245
Min length3

Characters and Unicode

Total characters162639
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIntermediate
2nd rowIntermediate
3rd rowTBD
4th rowIntermediate
5th rowIntermediate

Common Values

ValueCountFrequency (%)
Intermediate 11158
38.7%
Poor 3323
 
11.5%
Favorable 1208
 
4.2%
TBD 1043
 
3.6%
Not tested 145
 
0.5%
(Missing) 11923
41.4%

Length

2024-12-17T12:01:17.647106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:17.735906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
intermediate 11158
65.6%
poor 3323
 
19.5%
favorable 1208
 
7.1%
tbd 1043
 
6.1%
not 145
 
0.9%
tested 145
 
0.9%

Most occurring characters

ValueCountFrequency (%)
e 34972
21.5%
t 22751
14.0%
r 15689
9.6%
a 13574
 
8.3%
d 11303
 
6.9%
I 11158
 
6.9%
n 11158
 
6.9%
m 11158
 
6.9%
i 11158
 
6.9%
o 7999
 
4.9%
Other values (11) 11719
 
7.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 162639
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 34972
21.5%
t 22751
14.0%
r 15689
9.6%
a 13574
 
8.3%
d 11303
 
6.9%
I 11158
 
6.9%
n 11158
 
6.9%
m 11158
 
6.9%
i 11158
 
6.9%
o 7999
 
4.9%
Other values (11) 11719
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 162639
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 34972
21.5%
t 22751
14.0%
r 15689
9.6%
a 13574
 
8.3%
d 11303
 
6.9%
I 11158
 
6.9%
n 11158
 
6.9%
m 11158
 
6.9%
i 11158
 
6.9%
o 7999
 
4.9%
Other values (11) 11719
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 162639
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 34972
21.5%
t 22751
14.0%
r 15689
9.6%
a 13574
 
8.3%
d 11303
 
6.9%
I 11158
 
6.9%
n 11158
 
6.9%
m 11158
 
6.9%
i 11158
 
6.9%
o 7999
 
4.9%
Other values (11) 11719
 
7.2%

conditioning_intensity
Categorical

MISSING 

Distinct6
Distinct (%)< 0.1%
Missing4789
Missing (%)16.6%
Memory size1.6 MiB
MAC
12288 
RIC
7722 
NMA
3479 
TBD
 
373
No drugs reported
 
87

Length

Max length29
Median length3
Mean length3.1178626
Min length3

Characters and Unicode

Total characters74863
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMAC
2nd rowMAC
3rd rowMAC
4th rowMAC
5th rowRIC

Common Values

ValueCountFrequency (%)
MAC 12288
42.7%
RIC 7722
26.8%
NMA 3479
 
12.1%
TBD 373
 
1.3%
No drugs reported 87
 
0.3%
N/A, F(pre-TED) not submitted 62
 
0.2%
(Missing) 4789
 
16.6%

Length

2024-12-17T12:01:17.836138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:17.923412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
mac 12288
50.4%
ric 7722
31.7%
nma 3479
 
14.3%
tbd 373
 
1.5%
no 87
 
0.4%
drugs 87
 
0.4%
reported 87
 
0.4%
n/a 62
 
0.3%
f(pre-ted 62
 
0.3%
not 62
 
0.3%

Most occurring characters

ValueCountFrequency (%)
C 20010
26.7%
A 15829
21.1%
M 15767
21.1%
R 7722
 
10.3%
I 7722
 
10.3%
N 3628
 
4.8%
T 435
 
0.6%
D 435
 
0.6%
B 373
 
0.5%
360
 
0.5%
Other values (20) 2582
 
3.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 74863
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 20010
26.7%
A 15829
21.1%
M 15767
21.1%
R 7722
 
10.3%
I 7722
 
10.3%
N 3628
 
4.8%
T 435
 
0.6%
D 435
 
0.6%
B 373
 
0.5%
360
 
0.5%
Other values (20) 2582
 
3.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 74863
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 20010
26.7%
A 15829
21.1%
M 15767
21.1%
R 7722
 
10.3%
I 7722
 
10.3%
N 3628
 
4.8%
T 435
 
0.6%
D 435
 
0.6%
B 373
 
0.5%
360
 
0.5%
Other values (20) 2582
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 74863
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 20010
26.7%
A 15829
21.1%
M 15767
21.1%
R 7722
 
10.3%
I 7722
 
10.3%
N 3628
 
4.8%
T 435
 
0.6%
D 435
 
0.6%
B 373
 
0.5%
360
 
0.5%
Other values (20) 2582
 
3.4%

ethnicity
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing587
Missing (%)2.0%
Memory size2.1 MiB
Not Hispanic or Latino
24482 
Hispanic or Latino
3347 
Non-resident of the U.S.
 
384

Length

Max length24
Median length22
Mean length21.552688
Min length18

Characters and Unicode

Total characters608066
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Hispanic or Latino
2nd rowNot Hispanic or Latino
3rd rowNot Hispanic or Latino
4th rowNot Hispanic or Latino
5th rowHispanic or Latino

Common Values

ValueCountFrequency (%)
Not Hispanic or Latino 24482
85.0%
Hispanic or Latino 3347
 
11.6%
Non-resident of the U.S. 384
 
1.3%
(Missing) 587
 
2.0%

Length

2024-12-17T12:01:18.026779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:18.112048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hispanic 27829
25.4%
or 27829
25.4%
latino 27829
25.4%
not 24482
22.4%
non-resident 384
 
0.4%
of 384
 
0.4%
the 384
 
0.4%
u.s 384
 
0.4%

Most occurring characters

ValueCountFrequency (%)
i 83871
13.8%
81292
13.4%
o 80908
13.3%
n 56426
9.3%
a 55658
9.2%
t 53079
8.7%
s 28213
 
4.6%
r 28213
 
4.6%
c 27829
 
4.6%
L 27829
 
4.6%
Other values (11) 84748
13.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 608066
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 83871
13.8%
81292
13.4%
o 80908
13.3%
n 56426
9.3%
a 55658
9.2%
t 53079
8.7%
s 28213
 
4.6%
r 28213
 
4.6%
c 27829
 
4.6%
L 27829
 
4.6%
Other values (11) 84748
13.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 608066
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 83871
13.8%
81292
13.4%
o 80908
13.3%
n 56426
9.3%
a 55658
9.2%
t 53079
8.7%
s 28213
 
4.6%
r 28213
 
4.6%
c 27829
 
4.6%
L 27829
 
4.6%
Other values (11) 84748
13.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 608066
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 83871
13.8%
81292
13.4%
o 80908
13.3%
n 56426
9.3%
a 55658
9.2%
t 53079
8.7%
s 28213
 
4.6%
r 28213
 
4.6%
c 27829
 
4.6%
L 27829
 
4.6%
Other values (11) 84748
13.9%

year_hct
Real number (ℝ)

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.1794
Minimum2008
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:18.195540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2008
5-th percentile2008
Q12013
median2016
Q32018
95-th percentile2018
Maximum2020
Range12
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.1539139
Coefficient of variation (CV)0.0015650784
Kurtosis0.1021579
Mean2015.1794
Median Absolute Deviation (MAD)2
Skewness-1.093487
Sum58037168
Variance9.9471729
MonotonicityNot monotonic
2024-12-17T12:01:18.295383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2018 7336
25.5%
2016 5049
17.5%
2017 4830
16.8%
2008 2544
 
8.8%
2015 2243
 
7.8%
2013 1871
 
6.5%
2012 1571
 
5.5%
2014 1098
 
3.8%
2019 774
 
2.7%
2011 599
 
2.1%
Other values (3) 885
 
3.1%
ValueCountFrequency (%)
2008 2544
8.8%
2009 503
 
1.7%
2010 378
 
1.3%
2011 599
 
2.1%
2012 1571
 
5.5%
2013 1871
 
6.5%
2014 1098
 
3.8%
2015 2243
7.8%
2016 5049
17.5%
2017 4830
16.8%
ValueCountFrequency (%)
2020 4
 
< 0.1%
2019 774
 
2.7%
2018 7336
25.5%
2017 4830
16.8%
2016 5049
17.5%
2015 2243
 
7.8%
2014 1098
 
3.8%
2013 1871
 
6.5%
2012 1571
 
5.5%
2011 599
 
2.1%

obesity
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing1760
Missing (%)6.1%
Memory size1.6 MiB
No
25144 
Yes
 
1779
Not done
 
117

Length

Max length8
Median length2
Mean length2.091753
Min length2

Characters and Unicode

Total characters56561
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 25144
87.3%
Yes 1779
 
6.2%
Not done 117
 
0.4%
(Missing) 1760
 
6.1%

Length

2024-12-17T12:01:18.401097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:18.483699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 25144
92.6%
yes 1779
 
6.6%
not 117
 
0.4%
done 117
 
0.4%

Most occurring characters

ValueCountFrequency (%)
o 25378
44.9%
N 25261
44.7%
e 1896
 
3.4%
Y 1779
 
3.1%
s 1779
 
3.1%
t 117
 
0.2%
117
 
0.2%
d 117
 
0.2%
n 117
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 56561
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 25378
44.9%
N 25261
44.7%
e 1896
 
3.4%
Y 1779
 
3.1%
s 1779
 
3.1%
t 117
 
0.2%
117
 
0.2%
d 117
 
0.2%
n 117
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 56561
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 25378
44.9%
N 25261
44.7%
e 1896
 
3.4%
Y 1779
 
3.1%
s 1779
 
3.1%
t 117
 
0.2%
117
 
0.2%
d 117
 
0.2%
n 117
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 56561
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 25378
44.9%
N 25261
44.7%
e 1896
 
3.4%
Y 1779
 
3.1%
s 1779
 
3.1%
t 117
 
0.2%
117
 
0.2%
d 117
 
0.2%
n 117
 
0.2%

mrd_hct
Categorical

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing16597
Missing (%)57.6%
Memory size1.4 MiB
Negative
8068 
Positive
4135 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters97624
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPositive
2nd rowPositive
3rd rowNegative
4th rowPositive
5th rowNegative

Common Values

ValueCountFrequency (%)
Negative 8068
28.0%
Positive 4135
 
14.4%
(Missing) 16597
57.6%

Length

2024-12-17T12:01:18.568025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:18.647622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
negative 8068
66.1%
positive 4135
33.9%

Most occurring characters

ValueCountFrequency (%)
e 20271
20.8%
i 16338
16.7%
t 12203
12.5%
v 12203
12.5%
N 8068
 
8.3%
g 8068
 
8.3%
a 8068
 
8.3%
P 4135
 
4.2%
o 4135
 
4.2%
s 4135
 
4.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 97624
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 20271
20.8%
i 16338
16.7%
t 12203
12.5%
v 12203
12.5%
N 8068
 
8.3%
g 8068
 
8.3%
a 8068
 
8.3%
P 4135
 
4.2%
o 4135
 
4.2%
s 4135
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 97624
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 20271
20.8%
i 16338
16.7%
t 12203
12.5%
v 12203
12.5%
N 8068
 
8.3%
g 8068
 
8.3%
a 8068
 
8.3%
P 4135
 
4.2%
o 4135
 
4.2%
s 4135
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 97624
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 20271
20.8%
i 16338
16.7%
t 12203
12.5%
v 12203
12.5%
N 8068
 
8.3%
g 8068
 
8.3%
a 8068
 
8.3%
P 4135
 
4.2%
o 4135
 
4.2%
s 4135
 
4.2%
Distinct2
Distinct (%)< 0.1%
Missing225
Missing (%)0.8%
Memory size56.4 KiB
False
17591 
True
10984 
(Missing)
 
225
ValueCountFrequency (%)
False 17591
61.1%
True 10984
38.1%
(Missing) 225
 
0.8%
2024-12-17T12:01:18.718201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

tce_match
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing18996
Missing (%)66.0%
Memory size1.4 MiB
Permissive
6272 
GvH non-permissive
1605 
Fully matched
1059 
HvG non-permissive
868 

Length

Max length18
Median length10
Mean length12.342003
Min length10

Characters and Unicode

Total characters121001
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPermissive
2nd rowPermissive
3rd rowPermissive
4th rowPermissive
5th rowPermissive

Common Values

ValueCountFrequency (%)
Permissive 6272
 
21.8%
GvH non-permissive 1605
 
5.6%
Fully matched 1059
 
3.7%
HvG non-permissive 868
 
3.0%
(Missing) 18996
66.0%

Length

2024-12-17T12:01:18.813371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:18.906344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
permissive 6272
47.0%
non-permissive 2473
 
18.5%
gvh 1605
 
12.0%
fully 1059
 
7.9%
matched 1059
 
7.9%
hvg 868
 
6.5%

Most occurring characters

ValueCountFrequency (%)
e 18549
15.3%
i 17490
14.5%
s 17490
14.5%
v 11218
9.3%
m 9804
8.1%
r 8745
7.2%
P 6272
 
5.2%
n 4946
 
4.1%
3532
 
2.9%
p 2473
 
2.0%
Other values (13) 20482
16.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 121001
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 18549
15.3%
i 17490
14.5%
s 17490
14.5%
v 11218
9.3%
m 9804
8.1%
r 8745
7.2%
P 6272
 
5.2%
n 4946
 
4.1%
3532
 
2.9%
p 2473
 
2.0%
Other values (13) 20482
16.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 121001
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 18549
15.3%
i 17490
14.5%
s 17490
14.5%
v 11218
9.3%
m 9804
8.1%
r 8745
7.2%
P 6272
 
5.2%
n 4946
 
4.1%
3532
 
2.9%
p 2473
 
2.0%
Other values (13) 20482
16.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 121001
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 18549
15.3%
i 17490
14.5%
s 17490
14.5%
v 11218
9.3%
m 9804
8.1%
r 8745
7.2%
P 6272
 
5.2%
n 4946
 
4.1%
3532
 
2.9%
p 2473
 
2.0%
Other values (13) 20482
16.9%

hla_match_a_high
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing4301
Missing (%)14.9%
Memory size1.6 MiB
2.0
17304 
1.0
7132 
0.0
 
63

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters73497
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 17304
60.1%
1.0 7132
24.8%
0.0 63
 
0.2%
(Missing) 4301
 
14.9%

Length

2024-12-17T12:01:19.000368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:19.077579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 17304
70.6%
1.0 7132
29.1%
0.0 63
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0 24562
33.4%
. 24499
33.3%
2 17304
23.5%
1 7132
 
9.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 73497
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 24562
33.4%
. 24499
33.3%
2 17304
23.5%
1 7132
 
9.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 73497
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 24562
33.4%
. 24499
33.3%
2 17304
23.5%
1 7132
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 73497
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 24562
33.4%
. 24499
33.3%
2 17304
23.5%
1 7132
 
9.7%

hepatic_severe
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing1871
Missing (%)6.5%
Memory size1.6 MiB
No
25238 
Yes
 
1481
Not done
 
210

Length

Max length8
Median length2
Mean length2.1017862
Min length2

Characters and Unicode

Total characters56599
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 25238
87.6%
Yes 1481
 
5.1%
Not done 210
 
0.7%
(Missing) 1871
 
6.5%

Length

2024-12-17T12:01:19.166542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:19.249508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 25238
93.0%
yes 1481
 
5.5%
not 210
 
0.8%
done 210
 
0.8%

Most occurring characters

ValueCountFrequency (%)
o 25658
45.3%
N 25448
45.0%
e 1691
 
3.0%
Y 1481
 
2.6%
s 1481
 
2.6%
t 210
 
0.4%
210
 
0.4%
d 210
 
0.4%
n 210
 
0.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 56599
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 25658
45.3%
N 25448
45.0%
e 1691
 
3.0%
Y 1481
 
2.6%
s 1481
 
2.6%
t 210
 
0.4%
210
 
0.4%
d 210
 
0.4%
n 210
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 56599
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 25658
45.3%
N 25448
45.0%
e 1691
 
3.0%
Y 1481
 
2.6%
s 1481
 
2.6%
t 210
 
0.4%
210
 
0.4%
d 210
 
0.4%
n 210
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 56599
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 25658
45.3%
N 25448
45.0%
e 1691
 
3.0%
Y 1481
 
2.6%
s 1481
 
2.6%
t 210
 
0.4%
210
 
0.4%
d 210
 
0.4%
n 210
 
0.4%

donor_age
Real number (ℝ)

MISSING 

Distinct20909
Distinct (%)77.5%
Missing1808
Missing (%)6.3%
Infinite0
Infinite (%)0.0%
Mean42.511591
Minimum18.01
Maximum84.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:19.342464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum18.01
5-th percentile22.33655
Q128.447
median40.063
Q356.1315
95-th percentile67.5913
Maximum84.8
Range66.79
Interquartile range (IQR)27.6845

Descriptive statistics

Standard deviation15.251434
Coefficient of variation (CV)0.35875943
Kurtosis-1.1841265
Mean42.511591
Median Absolute Deviation (MAD)13.114
Skewness0.29620496
Sum1147472.9
Variance232.60624
MonotonicityNot monotonic
2024-12-17T12:01:19.545906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
23.57 7
 
< 0.1%
24.504 6
 
< 0.1%
25.378 6
 
< 0.1%
38.722 5
 
< 0.1%
36.313 5
 
< 0.1%
23.725 5
 
< 0.1%
25.185 5
 
< 0.1%
23.546 5
 
< 0.1%
23.718 5
 
< 0.1%
53.7 5
 
< 0.1%
Other values (20899) 26938
93.5%
(Missing) 1808
 
6.3%
ValueCountFrequency (%)
18.01 1
< 0.1%
18.012 1
< 0.1%
18.016 1
< 0.1%
18.02 2
< 0.1%
18.023 1
< 0.1%
18.027 1
< 0.1%
18.05 1
< 0.1%
18.061 1
< 0.1%
18.071 1
< 0.1%
18.085 1
< 0.1%
ValueCountFrequency (%)
84.8 1
< 0.1%
82.509 1
< 0.1%
81.539 1
< 0.1%
81.503 1
< 0.1%
81.501 1
< 0.1%
81.201 1
< 0.1%
80.813 1
< 0.1%
80.745 1
< 0.1%
80.288 1
< 0.1%
80.252 1
< 0.1%

prior_tumor
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing1678
Missing (%)5.8%
Memory size1.6 MiB
No
23828 
Yes
3009 
Not done
 
285

Length

Max length8
Median length2
Mean length2.1739916
Min length2

Characters and Unicode

Total characters58963
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 23828
82.7%
Yes 3009
 
10.4%
Not done 285
 
1.0%
(Missing) 1678
 
5.8%

Length

2024-12-17T12:01:19.658314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:19.740816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 23828
86.9%
yes 3009
 
11.0%
not 285
 
1.0%
done 285
 
1.0%

Most occurring characters

ValueCountFrequency (%)
o 24398
41.4%
N 24113
40.9%
e 3294
 
5.6%
Y 3009
 
5.1%
s 3009
 
5.1%
t 285
 
0.5%
285
 
0.5%
d 285
 
0.5%
n 285
 
0.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 58963
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 24398
41.4%
N 24113
40.9%
e 3294
 
5.6%
Y 3009
 
5.1%
s 3009
 
5.1%
t 285
 
0.5%
285
 
0.5%
d 285
 
0.5%
n 285
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 58963
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 24398
41.4%
N 24113
40.9%
e 3294
 
5.6%
Y 3009
 
5.1%
s 3009
 
5.1%
t 285
 
0.5%
285
 
0.5%
d 285
 
0.5%
n 285
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 58963
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 24398
41.4%
N 24113
40.9%
e 3294
 
5.6%
Y 3009
 
5.1%
s 3009
 
5.1%
t 285
 
0.5%
285
 
0.5%
d 285
 
0.5%
n 285
 
0.5%

hla_match_b_low
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2565
Missing (%)8.9%
Memory size1.6 MiB
2.0
18951 
1.0
7220 
0.0
 
64

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters78705
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 18951
65.8%
1.0 7220
 
25.1%
0.0 64
 
0.2%
(Missing) 2565
 
8.9%

Length

2024-12-17T12:01:19.826638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:19.904427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 18951
72.2%
1.0 7220
 
27.5%
0.0 64
 
0.2%

Most occurring characters

ValueCountFrequency (%)
0 26299
33.4%
. 26235
33.3%
2 18951
24.1%
1 7220
 
9.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 78705
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 26299
33.4%
. 26235
33.3%
2 18951
24.1%
1 7220
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 78705
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 26299
33.4%
. 26235
33.3%
2 18951
24.1%
1 7220
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 78705
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 26299
33.4%
. 26235
33.3%
2 18951
24.1%
1 7220
 
9.2%

peptic_ulcer
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2419
Missing (%)8.4%
Memory size1.6 MiB
No
25956 
Yes
 
259
Not done
 
166

Length

Max length8
Median length2
Mean length2.0475721
Min length2

Characters and Unicode

Total characters54017
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 25956
90.1%
Yes 259
 
0.9%
Not done 166
 
0.6%
(Missing) 2419
 
8.4%

Length

2024-12-17T12:01:19.995603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:20.077335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 25956
97.8%
yes 259
 
1.0%
not 166
 
0.6%
done 166
 
0.6%

Most occurring characters

ValueCountFrequency (%)
o 26288
48.7%
N 26122
48.4%
e 425
 
0.8%
Y 259
 
0.5%
s 259
 
0.5%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 54017
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 26288
48.7%
N 26122
48.4%
e 425
 
0.8%
Y 259
 
0.5%
s 259
 
0.5%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 54017
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 26288
48.7%
N 26122
48.4%
e 425
 
0.8%
Y 259
 
0.5%
s 259
 
0.5%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 54017
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 26288
48.7%
N 26122
48.4%
e 425
 
0.8%
Y 259
 
0.5%
s 259
 
0.5%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

age_at_hct
Real number (ℝ)

Distinct22168
Distinct (%)77.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.663162
Minimum0.044
Maximum73.726
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:20.171182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.044
5-th percentile0.96185
Q119.539
median41.006
Q355.96525
95-th percentile66.33815
Maximum73.726
Range73.682
Interquartile range (IQR)36.42625

Descriptive statistics

Standard deviation21.147581
Coefficient of variation (CV)0.54696977
Kurtosis-1.0649237
Mean38.663162
Median Absolute Deviation (MAD)16.496
Skewness-0.40381335
Sum1113499.1
Variance447.22017
MonotonicityNot monotonic
2024-12-17T12:01:20.291836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.044 1005
 
3.5%
64.47 6
 
< 0.1%
15.82 6
 
< 0.1%
63.701 5
 
< 0.1%
65.184 5
 
< 0.1%
63.217 5
 
< 0.1%
50.613 5
 
< 0.1%
63.818 5
 
< 0.1%
54.496 5
 
< 0.1%
37.722 5
 
< 0.1%
Other values (22158) 27748
96.3%
ValueCountFrequency (%)
0.044 1005
3.5%
0.046 1
 
< 0.1%
0.05 1
 
< 0.1%
0.053 1
 
< 0.1%
0.057 1
 
< 0.1%
0.059 1
 
< 0.1%
0.062 1
 
< 0.1%
0.066 1
 
< 0.1%
0.069 1
 
< 0.1%
0.071 1
 
< 0.1%
ValueCountFrequency (%)
73.726 1
< 0.1%
73.717 1
< 0.1%
73.67 1
< 0.1%
73.574 1
< 0.1%
73.459 1
< 0.1%
73.458 1
< 0.1%
73.446 1
< 0.1%
73.267 1
< 0.1%
72.925 1
< 0.1%
72.757 1
< 0.1%

hla_match_a_low
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2390
Missing (%)8.3%
Memory size1.6 MiB
2.0
18776 
1.0
7585 
0.0
 
49

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters79230
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 18776
65.2%
1.0 7585
26.3%
0.0 49
 
0.2%
(Missing) 2390
 
8.3%

Length

2024-12-17T12:01:20.400248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:20.480308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 18776
71.1%
1.0 7585
28.7%
0.0 49
 
0.2%

Most occurring characters

ValueCountFrequency (%)
0 26459
33.4%
. 26410
33.3%
2 18776
23.7%
1 7585
 
9.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 79230
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 26459
33.4%
. 26410
33.3%
2 18776
23.7%
1 7585
 
9.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 79230
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 26459
33.4%
. 26410
33.3%
2 18776
23.7%
1 7585
 
9.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 79230
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 26459
33.4%
. 26410
33.3%
2 18776
23.7%
1 7585
 
9.6%

gvhd_proph
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)0.1%
Missing225
Missing (%)0.8%
Memory size2.1 MiB
FK+ MMF +- others
10440 
Cyclophosphamide alone
5270 
FK+ MTX +- others(not MMF)
4262 
Cyclophosphamide +- others
2369 
CSA + MMF +- others(not FK)
2278 
Other values (12)
3956 

Length

Max length31
Median length29
Mean length20.567874
Min length7

Characters and Unicode

Total characters587727
Distinct characters46
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowFKalone
2nd rowOther GVHD Prophylaxis
3rd rowCyclophosphamide alone
4th rowFK+ MMF +- others
5th rowTDEPLETION +- other

Common Values

ValueCountFrequency (%)
FK+ MMF +- others 10440
36.2%
Cyclophosphamide alone 5270
18.3%
FK+ MTX +- others(not MMF) 4262
14.8%
Cyclophosphamide +- others 2369
 
8.2%
CSA + MMF +- others(not FK) 2278
 
7.9%
FKalone 1230
 
4.3%
Other GVHD Prophylaxis 550
 
1.9%
TDEPLETION alone 545
 
1.9%
TDEPLETION +- other 539
 
1.9%
No GvHD Prophylaxis 262
 
0.9%
Other values (7) 830
 
2.9%

Length

2024-12-17T12:01:20.570671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
22754
21.9%
fk 16981
16.4%
mmf 16980
16.4%
others 12809
12.3%
cyclophosphamide 7639
 
7.4%
others(not 6788
 
6.5%
alone 6280
 
6.1%
mtx 4486
 
4.3%
csa 2739
 
2.6%
fkalone 1230
 
1.2%
Other values (14) 5040
 
4.9%

Most occurring characters

ValueCountFrequency (%)
75151
 
12.8%
o 50903
 
8.7%
M 38966
 
6.6%
+ 37395
 
6.4%
h 36831
 
6.3%
e 36688
 
6.2%
F 35686
 
6.1%
s 28416
 
4.8%
t 28021
 
4.8%
r 21615
 
3.7%
Other values (36) 198055
33.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 587727
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
75151
 
12.8%
o 50903
 
8.7%
M 38966
 
6.6%
+ 37395
 
6.4%
h 36831
 
6.3%
e 36688
 
6.2%
F 35686
 
6.1%
s 28416
 
4.8%
t 28021
 
4.8%
r 21615
 
3.7%
Other values (36) 198055
33.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 587727
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
75151
 
12.8%
o 50903
 
8.7%
M 38966
 
6.6%
+ 37395
 
6.4%
h 36831
 
6.3%
e 36688
 
6.2%
F 35686
 
6.1%
s 28416
 
4.8%
t 28021
 
4.8%
r 21615
 
3.7%
Other values (36) 198055
33.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 587727
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
75151
 
12.8%
o 50903
 
8.7%
M 38966
 
6.6%
+ 37395
 
6.4%
h 36831
 
6.3%
e 36688
 
6.2%
F 35686
 
6.1%
s 28416
 
4.8%
t 28021
 
4.8%
r 21615
 
3.7%
Other values (36) 198055
33.7%

rheum_issue
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2183
Missing (%)7.6%
Memory size1.6 MiB
No
26015 
Yes
 
457
Not done
 
145

Length

Max length8
Median length2
Mean length2.0498554
Min length2

Characters and Unicode

Total characters54561
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 26015
90.3%
Yes 457
 
1.6%
Not done 145
 
0.5%
(Missing) 2183
 
7.6%

Length

2024-12-17T12:01:20.671335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:20.753658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 26015
97.2%
yes 457
 
1.7%
not 145
 
0.5%
done 145
 
0.5%

Most occurring characters

ValueCountFrequency (%)
o 26305
48.2%
N 26160
47.9%
e 602
 
1.1%
Y 457
 
0.8%
s 457
 
0.8%
t 145
 
0.3%
145
 
0.3%
d 145
 
0.3%
n 145
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 54561
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 26305
48.2%
N 26160
47.9%
e 602
 
1.1%
Y 457
 
0.8%
s 457
 
0.8%
t 145
 
0.3%
145
 
0.3%
d 145
 
0.3%
n 145
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 54561
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 26305
48.2%
N 26160
47.9%
e 602
 
1.1%
Y 457
 
0.8%
s 457
 
0.8%
t 145
 
0.3%
145
 
0.3%
d 145
 
0.3%
n 145
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 54561
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 26305
48.2%
N 26160
47.9%
e 602
 
1.1%
Y 457
 
0.8%
s 457
 
0.8%
t 145
 
0.3%
145
 
0.3%
d 145
 
0.3%
n 145
 
0.3%

sex_match
Categorical

Distinct4
Distinct (%)< 0.1%
Missing261
Missing (%)0.9%
Memory size1.6 MiB
M-M
7980 
F-M
7822 
M-F
6715 
F-F
6022 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters85617
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM-F
2nd rowF-F
3rd rowF-M
4th rowM-M
5th rowM-F

Common Values

ValueCountFrequency (%)
M-M 7980
27.7%
F-M 7822
27.2%
M-F 6715
23.3%
F-F 6022
20.9%
(Missing) 261
 
0.9%

Length

2024-12-17T12:01:20.837327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:20.920413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
m-m 7980
28.0%
f-m 7822
27.4%
m-f 6715
23.5%
f-f 6022
21.1%

Most occurring characters

ValueCountFrequency (%)
M 30497
35.6%
- 28539
33.3%
F 26581
31.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 85617
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 30497
35.6%
- 28539
33.3%
F 26581
31.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 85617
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 30497
35.6%
- 28539
33.3%
F 26581
31.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 85617
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 30497
35.6%
- 28539
33.3%
F 26581
31.0%

hla_match_b_high
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing4088
Missing (%)14.2%
Memory size1.6 MiB
2.0
17366 
1.0
7269 
0.0
 
77

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters74136
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 17366
60.3%
1.0 7269
25.2%
0.0 77
 
0.3%
(Missing) 4088
 
14.2%

Length

2024-12-17T12:01:21.011837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:21.089597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 17366
70.3%
1.0 7269
29.4%
0.0 77
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0 24789
33.4%
. 24712
33.3%
2 17366
23.4%
1 7269
 
9.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 74136
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 24789
33.4%
. 24712
33.3%
2 17366
23.4%
1 7269
 
9.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 74136
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 24789
33.4%
. 24712
33.3%
2 17366
23.4%
1 7269
 
9.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 74136
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 24789
33.4%
. 24712
33.3%
2 17366
23.4%
1 7269
 
9.8%

race_group
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
More than one race
4845 
Asian
4832 
White
4831 
Black or African-American
4795 
American Indian or Alaska Native
4790 

Length

Max length41
Median length32
Mean length20.891215
Min length5

Characters and Unicode

Total characters601667
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMore than one race
2nd rowAsian
3rd rowMore than one race
4th rowWhite
5th rowAmerican Indian or Alaska Native

Common Values

ValueCountFrequency (%)
More than one race 4845
16.8%
Asian 4832
16.8%
White 4831
16.8%
Black or African-American 4795
16.6%
American Indian or Alaska Native 4790
16.6%
Native Hawaiian or other Pacific Islander 4707
16.3%

Length

2024-12-17T12:01:21.182399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:21.279124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
or 14292
14.9%
native 9497
 
9.9%
more 4845
 
5.1%
than 4845
 
5.1%
one 4845
 
5.1%
race 4845
 
5.1%
asian 4832
 
5.1%
white 4831
 
5.1%
african-american 4795
 
5.0%
black 4795
 
5.0%
Other values (7) 33198
34.7%

Most occurring characters

ValueCountFrequency (%)
a 81099
13.5%
66820
11.1%
i 57158
 
9.5%
n 47896
 
8.0%
e 47862
 
8.0%
r 47776
 
7.9%
c 33434
 
5.6%
o 28689
 
4.8%
A 24002
 
4.0%
t 23880
 
4.0%
Other values (17) 143051
23.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 601667
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 81099
13.5%
66820
11.1%
i 57158
 
9.5%
n 47896
 
8.0%
e 47862
 
8.0%
r 47776
 
7.9%
c 33434
 
5.6%
o 28689
 
4.8%
A 24002
 
4.0%
t 23880
 
4.0%
Other values (17) 143051
23.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 601667
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 81099
13.5%
66820
11.1%
i 57158
 
9.5%
n 47896
 
8.0%
e 47862
 
8.0%
r 47776
 
7.9%
c 33434
 
5.6%
o 28689
 
4.8%
A 24002
 
4.0%
t 23880
 
4.0%
Other values (17) 143051
23.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 601667
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 81099
13.5%
66820
11.1%
i 57158
 
9.5%
n 47896
 
8.0%
e 47862
 
8.0%
r 47776
 
7.9%
c 33434
 
5.6%
o 28689
 
4.8%
A 24002
 
4.0%
t 23880
 
4.0%
Other values (17) 143051
23.8%

comorbidity_score
Real number (ℝ)

MISSING  ZEROS 

Distinct11
Distinct (%)< 0.1%
Missing477
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean1.7023267
Minimum0
Maximum10
Zeros10738
Zeros (%)37.3%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:21.380308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile6
Maximum10
Range10
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.9944429
Coefficient of variation (CV)1.1715982
Kurtosis2.0673824
Mean1.7023267
Median Absolute Deviation (MAD)1
Skewness1.474657
Sum48215
Variance3.9778023
MonotonicityNot monotonic
2024-12-17T12:01:21.469999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0 10738
37.3%
2 5899
20.5%
1 4852
16.8%
3 2460
 
8.5%
4 1396
 
4.8%
5 1219
 
4.2%
6 708
 
2.5%
7 492
 
1.7%
8 293
 
1.0%
9 190
 
0.7%
(Missing) 477
 
1.7%
ValueCountFrequency (%)
0 10738
37.3%
1 4852
16.8%
2 5899
20.5%
3 2460
 
8.5%
4 1396
 
4.8%
5 1219
 
4.2%
6 708
 
2.5%
7 492
 
1.7%
8 293
 
1.0%
9 190
 
0.7%
ValueCountFrequency (%)
10 76
 
0.3%
9 190
 
0.7%
8 293
 
1.0%
7 492
 
1.7%
6 708
 
2.5%
5 1219
 
4.2%
4 1396
 
4.8%
3 2460
8.5%
2 5899
20.5%
1 4852
16.8%

karnofsky_score
Real number (ℝ)

MISSING 

Distinct7
Distinct (%)< 0.1%
Missing870
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean83.83208
Minimum40
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:21.556582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile70
Q170
median90
Q390
95-th percentile100
Maximum100
Range60
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.02884
Coefficient of variation (CV)0.1315587
Kurtosis-0.55721296
Mean83.83208
Median Absolute Deviation (MAD)0
Skewness-0.68324853
Sum2341430
Variance121.63531
MonotonicityNot monotonic
2024-12-17T12:01:21.651834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
90 15336
53.2%
70 6690
23.2%
100 2476
 
8.6%
80 2036
 
7.1%
60 1291
 
4.5%
50 91
 
0.3%
40 10
 
< 0.1%
(Missing) 870
 
3.0%
ValueCountFrequency (%)
40 10
 
< 0.1%
50 91
 
0.3%
60 1291
 
4.5%
70 6690
23.2%
80 2036
 
7.1%
90 15336
53.2%
100 2476
 
8.6%
ValueCountFrequency (%)
100 2476
 
8.6%
90 15336
53.2%
80 2036
 
7.1%
70 6690
23.2%
60 1291
 
4.5%
50 91
 
0.3%
40 10
 
< 0.1%

hepatic_mild
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing1917
Missing (%)6.7%
Memory size1.6 MiB
No
24989 
Yes
 
1754
Not done
 
140

Length

Max length8
Median length2
Mean length2.0964922
Min length2

Characters and Unicode

Total characters56360
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowYes
5th rowNo

Common Values

ValueCountFrequency (%)
No 24989
86.8%
Yes 1754
 
6.1%
Not done 140
 
0.5%
(Missing) 1917
 
6.7%

Length

2024-12-17T12:01:21.762340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:21.850454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 24989
92.5%
yes 1754
 
6.5%
not 140
 
0.5%
done 140
 
0.5%

Most occurring characters

ValueCountFrequency (%)
o 25269
44.8%
N 25129
44.6%
e 1894
 
3.4%
Y 1754
 
3.1%
s 1754
 
3.1%
t 140
 
0.2%
140
 
0.2%
d 140
 
0.2%
n 140
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 56360
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 25269
44.8%
N 25129
44.6%
e 1894
 
3.4%
Y 1754
 
3.1%
s 1754
 
3.1%
t 140
 
0.2%
140
 
0.2%
d 140
 
0.2%
n 140
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 56360
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 25269
44.8%
N 25129
44.6%
e 1894
 
3.4%
Y 1754
 
3.1%
s 1754
 
3.1%
t 140
 
0.2%
140
 
0.2%
d 140
 
0.2%
n 140
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 56360
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 25269
44.8%
N 25129
44.6%
e 1894
 
3.4%
Y 1754
 
3.1%
s 1754
 
3.1%
t 140
 
0.2%
140
 
0.2%
d 140
 
0.2%
n 140
 
0.2%

tce_div_match
Categorical

HIGH CORRELATION  MISSING 

Distinct4
Distinct (%)< 0.1%
Missing11396
Missing (%)39.6%
Memory size1.7 MiB
Permissive mismatched
12936 
GvH non-permissive
2458 
HvG non-permissive
1417 
Bi-directional non-permissive
 
593

Length

Max length29
Median length21
Mean length20.604631
Min length18

Characters and Unicode

Total characters358603
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPermissive mismatched
2nd rowPermissive mismatched
3rd rowPermissive mismatched
4th rowPermissive mismatched
5th rowPermissive mismatched

Common Values

ValueCountFrequency (%)
Permissive mismatched 12936
44.9%
GvH non-permissive 2458
 
8.5%
HvG non-permissive 1417
 
4.9%
Bi-directional non-permissive 593
 
2.1%
(Missing) 11396
39.6%

Length

2024-12-17T12:01:21.948470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:22.048115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
permissive 12936
37.2%
mismatched 12936
37.2%
non-permissive 4468
 
12.8%
gvh 2458
 
7.1%
hvg 1417
 
4.1%
bi-directional 593
 
1.7%

Most occurring characters

ValueCountFrequency (%)
i 49523
13.8%
e 48337
13.5%
s 47744
13.3%
m 43276
12.1%
v 21279
 
5.9%
r 17997
 
5.0%
17404
 
4.9%
c 13529
 
3.8%
d 13529
 
3.8%
a 13529
 
3.8%
Other values (11) 72456
20.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 358603
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 49523
13.8%
e 48337
13.5%
s 47744
13.3%
m 43276
12.1%
v 21279
 
5.9%
r 17997
 
5.0%
17404
 
4.9%
c 13529
 
3.8%
d 13529
 
3.8%
a 13529
 
3.8%
Other values (11) 72456
20.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 358603
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 49523
13.8%
e 48337
13.5%
s 47744
13.3%
m 43276
12.1%
v 21279
 
5.9%
r 17997
 
5.0%
17404
 
4.9%
c 13529
 
3.8%
d 13529
 
3.8%
a 13529
 
3.8%
Other values (11) 72456
20.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 358603
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 49523
13.8%
e 48337
13.5%
s 47744
13.3%
m 43276
12.1%
v 21279
 
5.9%
r 17997
 
5.0%
17404
 
4.9%
c 13529
 
3.8%
d 13529
 
3.8%
a 13529
 
3.8%
Other values (11) 72456
20.2%

donor_related
Categorical

Distinct3
Distinct (%)< 0.1%
Missing158
Missing (%)0.5%
Memory size1.8 MiB
Related
16208 
Unrelated
12088 
Multiple donor (non-UCB)
 
346

Length

Max length24
Median length7
Mean length8.0494379
Min length7

Characters and Unicode

Total characters230552
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnrelated
2nd rowRelated
3rd rowRelated
4th rowUnrelated
5th rowRelated

Common Values

ValueCountFrequency (%)
Related 16208
56.3%
Unrelated 12088
42.0%
Multiple donor (non-UCB) 346
 
1.2%
(Missing) 158
 
0.5%

Length

2024-12-17T12:01:22.163197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:22.252735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
related 16208
55.3%
unrelated 12088
41.2%
multiple 346
 
1.2%
donor 346
 
1.2%
non-ucb 346
 
1.2%

Most occurring characters

ValueCountFrequency (%)
e 56938
24.7%
l 28988
12.6%
t 28642
12.4%
d 28642
12.4%
a 28296
12.3%
R 16208
 
7.0%
n 13126
 
5.7%
U 12434
 
5.4%
r 12434
 
5.4%
o 1038
 
0.5%
Other values (10) 3806
 
1.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 230552
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 56938
24.7%
l 28988
12.6%
t 28642
12.4%
d 28642
12.4%
a 28296
12.3%
R 16208
 
7.0%
n 13126
 
5.7%
U 12434
 
5.4%
r 12434
 
5.4%
o 1038
 
0.5%
Other values (10) 3806
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 230552
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 56938
24.7%
l 28988
12.6%
t 28642
12.4%
d 28642
12.4%
a 28296
12.3%
R 16208
 
7.0%
n 13126
 
5.7%
U 12434
 
5.4%
r 12434
 
5.4%
o 1038
 
0.5%
Other values (10) 3806
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 230552
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 56938
24.7%
l 28988
12.6%
t 28642
12.4%
d 28642
12.4%
a 28296
12.3%
R 16208
 
7.0%
n 13126
 
5.7%
U 12434
 
5.4%
r 12434
 
5.4%
o 1038
 
0.5%
Other values (10) 3806
 
1.7%

melphalan_dose
Categorical

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing1405
Missing (%)4.9%
Memory size1.9 MiB
N/A, Mel not given
20135 
MEL
7260 

Length

Max length18
Median length18
Mean length14.024822
Min length3

Characters and Unicode

Total characters384210
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN/A, Mel not given
2nd rowN/A, Mel not given
3rd rowN/A, Mel not given
4th rowN/A, Mel not given
5th rowMEL

Common Values

ValueCountFrequency (%)
N/A, Mel not given 20135
69.9%
MEL 7260
 
25.2%
(Missing) 1405
 
4.9%

Length

2024-12-17T12:01:22.351337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:22.433877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
mel 27395
31.2%
n/a 20135
22.9%
not 20135
22.9%
given 20135
22.9%

Most occurring characters

ValueCountFrequency (%)
60405
15.7%
e 40270
10.5%
n 40270
10.5%
M 27395
 
7.1%
N 20135
 
5.2%
/ 20135
 
5.2%
A 20135
 
5.2%
, 20135
 
5.2%
l 20135
 
5.2%
o 20135
 
5.2%
Other values (6) 95060
24.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 384210
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
60405
15.7%
e 40270
10.5%
n 40270
10.5%
M 27395
 
7.1%
N 20135
 
5.2%
/ 20135
 
5.2%
A 20135
 
5.2%
, 20135
 
5.2%
l 20135
 
5.2%
o 20135
 
5.2%
Other values (6) 95060
24.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 384210
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
60405
15.7%
e 40270
10.5%
n 40270
10.5%
M 27395
 
7.1%
N 20135
 
5.2%
/ 20135
 
5.2%
A 20135
 
5.2%
, 20135
 
5.2%
l 20135
 
5.2%
o 20135
 
5.2%
Other values (6) 95060
24.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 384210
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
60405
15.7%
e 40270
10.5%
n 40270
10.5%
M 27395
 
7.1%
N 20135
 
5.2%
/ 20135
 
5.2%
A 20135
 
5.2%
, 20135
 
5.2%
l 20135
 
5.2%
o 20135
 
5.2%
Other values (6) 95060
24.7%

hla_low_res_8
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct7
Distinct (%)< 0.1%
Missing3653
Missing (%)12.7%
Infinite0
Infinite (%)0.0%
Mean6.9034477
Minimum2
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:22.507638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q16
median8
Q38
95-th percentile8
Maximum8
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.5650173
Coefficient of variation (CV)0.22670082
Kurtosis-0.6619287
Mean6.9034477
Median Absolute Deviation (MAD)0
Skewness-1.0163772
Sum173601
Variance2.4492791
MonotonicityNot monotonic
2024-12-17T12:01:22.596015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
8 15160
52.6%
4 4259
 
14.8%
7 2603
 
9.0%
5 1613
 
5.6%
6 1488
 
5.2%
3 23
 
0.1%
2 1
 
< 0.1%
(Missing) 3653
 
12.7%
ValueCountFrequency (%)
2 1
 
< 0.1%
3 23
 
0.1%
4 4259
 
14.8%
5 1613
 
5.6%
6 1488
 
5.2%
7 2603
 
9.0%
8 15160
52.6%
ValueCountFrequency (%)
8 15160
52.6%
7 2603
 
9.0%
6 1488
 
5.2%
5 1613
 
5.6%
4 4259
 
14.8%
3 23
 
0.1%
2 1
 
< 0.1%

cardiac
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2542
Missing (%)8.8%
Memory size1.6 MiB
No
24592 
Yes
 
1519
Not done
 
147

Length

Max length8
Median length2
Mean length2.0914388
Min length2

Characters and Unicode

Total characters54917
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 24592
85.4%
Yes 1519
 
5.3%
Not done 147
 
0.5%
(Missing) 2542
 
8.8%

Length

2024-12-17T12:01:22.699178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:22.782318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 24592
93.1%
yes 1519
 
5.8%
not 147
 
0.6%
done 147
 
0.6%

Most occurring characters

ValueCountFrequency (%)
o 24886
45.3%
N 24739
45.0%
e 1666
 
3.0%
Y 1519
 
2.8%
s 1519
 
2.8%
t 147
 
0.3%
147
 
0.3%
d 147
 
0.3%
n 147
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 54917
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 24886
45.3%
N 24739
45.0%
e 1666
 
3.0%
Y 1519
 
2.8%
s 1519
 
2.8%
t 147
 
0.3%
147
 
0.3%
d 147
 
0.3%
n 147
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 54917
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 24886
45.3%
N 24739
45.0%
e 1666
 
3.0%
Y 1519
 
2.8%
s 1519
 
2.8%
t 147
 
0.3%
147
 
0.3%
d 147
 
0.3%
n 147
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 54917
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 24886
45.3%
N 24739
45.0%
e 1666
 
3.0%
Y 1519
 
2.8%
s 1519
 
2.8%
t 147
 
0.3%
147
 
0.3%
d 147
 
0.3%
n 147
 
0.3%

hla_match_drb1_high
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing3352
Missing (%)11.6%
Memory size1.6 MiB
2.0
18066 
1.0
7311 
0.0
 
71

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters76344
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 18066
62.7%
1.0 7311
25.4%
0.0 71
 
0.2%
(Missing) 3352
 
11.6%

Length

2024-12-17T12:01:22.865960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:22.945086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0 18066
71.0%
1.0 7311
28.7%
0.0 71
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0 25519
33.4%
. 25448
33.3%
2 18066
23.7%
1 7311
 
9.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 76344
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 25519
33.4%
. 25448
33.3%
2 18066
23.7%
1 7311
 
9.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 76344
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 25519
33.4%
. 25448
33.3%
2 18066
23.7%
1 7311
 
9.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 76344
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 25519
33.4%
. 25448
33.3%
2 18066
23.7%
1 7311
 
9.6%

pulm_moderate
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing2047
Missing (%)7.1%
Memory size1.6 MiB
No
21338 
Yes
5249 
Not done
 
166

Length

Max length8
Median length2
Mean length2.2334318
Min length2

Characters and Unicode

Total characters59751
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowYes
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 21338
74.1%
Yes 5249
 
18.2%
Not done 166
 
0.6%
(Missing) 2047
 
7.1%

Length

2024-12-17T12:01:23.035932image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:23.119322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
no 21338
79.3%
yes 5249
 
19.5%
not 166
 
0.6%
done 166
 
0.6%

Most occurring characters

ValueCountFrequency (%)
o 21670
36.3%
N 21504
36.0%
e 5415
 
9.1%
Y 5249
 
8.8%
s 5249
 
8.8%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 59751
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 21670
36.3%
N 21504
36.0%
e 5415
 
9.1%
Y 5249
 
8.8%
s 5249
 
8.8%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 59751
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 21670
36.3%
N 21504
36.0%
e 5415
 
9.1%
Y 5249
 
8.8%
s 5249
 
8.8%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 59751
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 21670
36.3%
N 21504
36.0%
e 5415
 
9.1%
Y 5249
 
8.8%
s 5249
 
8.8%
t 166
 
0.3%
166
 
0.3%
d 166
 
0.3%
n 166
 
0.3%

hla_low_res_10
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct7
Distinct (%)< 0.1%
Missing5064
Missing (%)17.6%
Infinite0
Infinite (%)0.0%
Mean8.6646866
Minimum4
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:23.278171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile5
Q17
median10
Q310
95-th percentile10
Maximum10
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.8827462
Coefficient of variation (CV)0.21728959
Kurtosis-0.55770349
Mean8.6646866
Median Absolute Deviation (MAD)0
Skewness-1.0427807
Sum205665
Variance3.5447331
MonotonicityNot monotonic
2024-12-17T12:01:23.365437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
10 13734
47.7%
5 3211
 
11.1%
9 2544
 
8.8%
6 1664
 
5.8%
8 1387
 
4.8%
7 1170
 
4.1%
4 26
 
0.1%
(Missing) 5064
 
17.6%
ValueCountFrequency (%)
4 26
 
0.1%
5 3211
 
11.1%
6 1664
 
5.8%
7 1170
 
4.1%
8 1387
 
4.8%
9 2544
 
8.8%
10 13734
47.7%
ValueCountFrequency (%)
10 13734
47.7%
9 2544
 
8.8%
8 1387
 
4.8%
7 1170
 
4.1%
6 1664
 
5.8%
5 3211
 
11.1%
4 26
 
0.1%

efs
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
1.0
15532 
0.0
13268 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters86400
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0 15532
53.9%
0.0 13268
46.1%

Length

2024-12-17T12:01:23.464663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-17T12:01:23.541993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0 15532
53.9%
0.0 13268
46.1%

Most occurring characters

ValueCountFrequency (%)
0 42068
48.7%
. 28800
33.3%
1 15532
 
18.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 86400
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 42068
48.7%
. 28800
33.3%
1 15532
 
18.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 86400
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 42068
48.7%
. 28800
33.3%
1 15532
 
18.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 86400
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 42068
48.7%
. 28800
33.3%
1 15532
 
18.0%

efs_time
Real number (ℝ)

HIGH CORRELATION 

Distinct19208
Distinct (%)66.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.237678
Minimum0.333
Maximum156.819
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size225.1 KiB
2024-12-17T12:01:23.634393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.333
5-th percentile3.31695
Q15.61975
median9.7965
Q335.1
95-th percentile74.5866
Maximum156.819
Range156.486
Interquartile range (IQR)29.48025

Descriptive statistics

Standard deviation24.799748
Coefficient of variation (CV)1.0672214
Kurtosis3.063962
Mean23.237678
Median Absolute Deviation (MAD)6.2955
Skewness1.7003992
Sum669245.13
Variance615.02752
MonotonicityNot monotonic
2024-12-17T12:01:23.753780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.697 10
 
< 0.1%
5.643 10
 
< 0.1%
5.801 9
 
< 0.1%
5.608 9
 
< 0.1%
5.886 9
 
< 0.1%
5.089 8
 
< 0.1%
4.727 8
 
< 0.1%
5.244 8
 
< 0.1%
5.033 8
 
< 0.1%
4.716 8
 
< 0.1%
Other values (19198) 28713
99.7%
ValueCountFrequency (%)
0.333 1
< 0.1%
0.482 1
< 0.1%
0.523 1
< 0.1%
0.533 1
< 0.1%
0.543 1
< 0.1%
0.552 1
< 0.1%
0.61 1
< 0.1%
0.612 1
< 0.1%
0.698 1
< 0.1%
0.711 1
< 0.1%
ValueCountFrequency (%)
156.819 1
< 0.1%
155.983 1
< 0.1%
155.283 1
< 0.1%
154.249 1
< 0.1%
153.711 1
< 0.1%
153.324 1
< 0.1%
153.292 1
< 0.1%
152.132 1
< 0.1%
151.377 1
< 0.1%
150.345 1
< 0.1%

Interactions

2024-12-17T12:01:09.672831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.359733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.387047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.468552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.467392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.489615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.581026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.579353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.591403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.575488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.688315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.680739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.758275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.454327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.469148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.549275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.552494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.572748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.662097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.662102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.671437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.659933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.769401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.762323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.842522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.542377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.549002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.629667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.634315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.657256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.744447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.745957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.752217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.834872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.850863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.843919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.928848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.627892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.633384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.711340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.725333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.740039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.827423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.828770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.831966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.927116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.933569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.926406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.017420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.715123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.800827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.799257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.812603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.829088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.913408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.917540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.916411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.016644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.018227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.010419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.104637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.800121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.885701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.884876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.896772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.911515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.997230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.001538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.997539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.103286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.102385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.095013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.191018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.883059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.966045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.965699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.981005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.996745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.079132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.085868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.079644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.186513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.185234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.176617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.281944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:58.968803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.053056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.051369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.066877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.083012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.163815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.170248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.163464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.272453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.271457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.261032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.453084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.047316image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.132171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.129151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.148608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.164028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.243521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.252598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.240026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.352591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.351019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.341333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.540347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.132943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.216828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.215951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.232602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.326353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.325374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.334754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.321432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.434731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.433595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.423858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.627228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.216092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.297623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.297680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.318015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.409507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.408267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.419171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.402756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.517252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.513903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.504657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:10.713650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:00:59.300324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:00.380632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:01.380619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:02.399992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:03.492906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:04.491977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:05.502794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:06.487229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:07.601168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:08.595513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-12-17T12:01:09.587604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-12-17T12:01:23.892891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
IDage_at_hctarrhythmiacardiaccmv_statuscomorbidity_scoreconditioning_intensitycyto_scorecyto_score_detaildiabetesdonor_agedonor_relateddri_scoreefsefs_timeethnicitygraft_typegvhd_prophhepatic_mildhepatic_severehla_high_res_10hla_high_res_6hla_high_res_8hla_low_res_10hla_low_res_6hla_low_res_8hla_match_a_highhla_match_a_lowhla_match_b_highhla_match_b_lowhla_match_c_highhla_match_c_lowhla_match_dqb1_highhla_match_dqb1_lowhla_match_drb1_highhla_match_drb1_lowhla_nmdp_6in_vivo_tcdkarnofsky_scoremelphalan_dosemrd_hctobesitypeptic_ulcerprim_disease_hctprior_tumorprod_typepsych_disturbpulm_moderatepulm_severerace_grouprenal_issuerheum_issuerituximabsex_matchtbi_statustce_div_matchtce_imm_matchtce_matchvent_histyear_hct
ID1.000-0.0020.0110.0120.0000.0010.0000.0090.0050.0130.0010.0040.0020.0000.0020.0110.0060.0050.0000.000-0.013-0.012-0.011-0.0120.000-0.0110.0000.0000.0000.0000.0150.0040.0050.0140.0040.0080.0000.008-0.0100.0000.0150.0070.0100.0120.0130.0110.0000.0130.0000.0090.0040.0000.0000.0000.0100.0000.0020.0090.0110.003
age_at_hct-0.0021.0000.0910.0900.0570.2490.1710.0930.0720.0990.1360.0380.1430.237-0.1730.1060.4810.1110.0080.0820.0370.0330.0270.0320.0580.0280.0760.0670.0620.0650.0690.0720.0790.0770.0770.1070.0660.286-0.2260.2060.0700.0450.0130.1720.1680.4880.0610.0970.0450.0710.0110.0120.0000.0470.0800.0330.0260.0450.048-0.002
arrhythmia0.0110.0911.0000.0460.0200.1160.0580.0180.0250.0390.0330.0130.0370.046-0.0570.0140.0680.0220.0090.0110.0370.0400.0370.0320.0280.0320.0320.0310.0250.0220.0220.0210.0140.0210.0260.0380.0350.027-0.0570.0420.0540.0120.0140.0410.0710.0720.0180.0330.0000.0270.0060.0220.0120.0190.0240.0220.0320.0250.0000.001
cardiac0.0120.0900.0461.0000.0250.1540.0450.0290.0500.0460.0370.0140.0590.101-0.0960.0220.0700.0290.0190.0140.0170.0210.0170.0160.0210.0160.0180.0230.0160.0250.0000.0160.0150.0140.0180.0220.0290.055-0.0560.0240.0140.0120.0160.0770.0610.0690.0300.0470.0250.0290.0150.0090.0000.0330.0300.0220.0090.0000.008-0.008
cmv_status0.0000.0570.0200.0251.0000.0160.0250.0410.0360.019-0.1090.0950.0500.1020.0010.0570.0760.0930.0180.0140.1860.1740.1800.1810.1010.1770.0930.0950.1180.1150.1070.1140.0990.0980.1080.1460.1010.0870.0040.0520.0530.0180.0000.0720.0480.0790.0460.0150.0250.0710.0110.0080.0160.0590.0580.0600.0610.0700.011-0.054
comorbidity_score0.0010.2490.1160.1540.0161.0000.0610.0260.0570.1430.0810.0360.0410.144-0.1690.0400.1190.0330.0600.039-0.020-0.008-0.014-0.0200.000-0.0180.0080.0000.0190.0170.0280.0220.0320.0300.0190.0240.0260.098-0.1760.0590.0480.0790.0200.0580.2460.1160.1050.1930.1920.0350.0220.0450.0000.0260.0280.0390.0270.0340.0370.087
conditioning_intensity0.0000.1710.0580.0450.0250.0611.0000.0640.0490.0810.0660.1030.0460.121-0.1020.0690.1630.1230.0190.0170.0720.0660.0630.0720.1440.0610.1720.1720.1720.1810.1890.1790.1710.1650.1900.2610.1430.100-0.1380.4210.0460.0300.0150.0930.0810.1610.0300.0240.0200.0370.0200.0220.0140.0280.1350.0410.0400.0470.0210.064
cyto_score0.0090.0930.0180.0290.0410.0260.0641.0000.2610.029-0.0270.0650.1510.027-0.0770.1110.1490.0500.0250.030-0.068-0.065-0.065-0.0710.053-0.0660.0680.0680.0540.0560.0550.0560.0750.0700.0790.1040.0580.0230.0610.0750.1390.0270.0190.1920.0280.1500.0160.0420.0170.0440.0130.0170.0180.0520.0720.0440.0350.0490.0260.069
cyto_score_detail0.0050.0720.0250.0500.0360.0570.0490.2611.0000.0150.0250.0670.2140.104-0.1490.0620.1190.0380.0300.023-0.006-0.011-0.017-0.0100.054-0.0170.0580.0600.0490.0690.0580.0630.0470.0540.0680.0860.0560.032-0.0170.0320.0550.0210.0160.1300.0780.1220.0200.0520.0160.0460.0200.0170.0230.0680.0340.0410.0400.0420.0000.066
diabetes0.0130.0990.0390.0460.0190.1430.0810.0290.0151.0000.0130.0130.0380.071-0.0850.0220.0910.0490.0260.046-0.029-0.022-0.026-0.0260.024-0.0260.0120.0090.0300.0300.0320.0270.0350.0260.0230.0360.0300.048-0.1070.0880.0440.0580.0060.0680.0730.0930.0220.0810.0330.0380.0200.0240.0150.0420.0250.0000.0100.0000.0070.036
donor_age0.0010.1360.0330.037-0.1090.0810.066-0.0270.0250.0131.0000.2800.0390.056-0.0330.0330.1440.0440.0120.040-0.087-0.066-0.078-0.0900.080-0.0760.1010.0990.1030.1010.1000.1050.1070.1010.1070.1430.1110.197-0.0390.1010.0660.0260.0200.0480.0650.1430.0030.0450.0000.0620.0000.0000.0240.0710.0480.0400.0320.0580.025-0.064
donor_related0.0040.0380.0130.0140.0950.0360.1030.0650.0670.0130.2801.0000.0230.034-0.0300.0750.0160.2080.0270.0200.4320.4010.4170.4400.3090.4200.2040.2120.2740.2930.2920.2860.2660.2570.3020.4180.3230.262-0.0730.0300.0800.0350.0140.0840.0220.0150.0200.0310.0210.1310.0070.0000.0270.0930.1190.0580.0710.0790.044-0.049
dri_score0.0020.1430.0370.0590.0500.0410.0460.1510.2140.0380.0390.0231.0000.2300.1700.0590.4160.0850.0280.091-0.0000.0000.004-0.0080.037-0.0070.0370.0420.0450.0400.0290.0400.0290.0360.0440.0790.0350.2790.1420.0330.1530.0390.0240.1830.0660.4220.0430.0650.0210.0570.0090.0220.0580.0650.0310.0560.0330.0470.0510.075
efs0.0000.2370.0460.1010.1020.1440.1210.0270.1040.0710.0560.0340.2301.000-0.8550.0430.2240.1730.0170.0850.0500.0470.0470.0470.0520.0460.0210.0150.0250.0400.0310.0140.0400.0310.0740.0670.0530.164-0.0950.0010.0250.0410.0250.2730.1000.2230.0660.0700.0800.0980.0210.0080.0000.0850.0540.0350.0280.0490.000-0.111
efs_time0.002-0.173-0.057-0.0960.001-0.169-0.102-0.077-0.149-0.085-0.033-0.0300.170-0.8551.0000.0520.2110.0660.0180.0600.0220.0260.0270.0190.0470.0210.0520.0540.0560.0610.0580.0560.0500.0440.0770.1000.0530.1600.1310.0300.0600.0310.0160.0880.0690.2110.0460.0500.0540.0430.0120.0210.0120.0530.0300.0460.0320.0510.011-0.128
ethnicity0.0110.1060.0140.0220.0570.0400.0690.1110.0620.0220.0330.0750.0590.0430.0521.0000.0700.0750.0420.0340.1170.1150.1130.1160.0800.1090.0690.0630.0600.0550.0420.0500.0560.0570.0750.0980.0790.027-0.0640.0160.0120.0000.0110.0910.0500.0730.0040.0240.0100.0570.0090.0000.0120.0150.0930.0540.0610.0410.008-0.065
graft_type0.0060.4810.0680.0700.0760.1190.1630.1490.1190.0910.1440.0160.4160.2240.2110.0701.0000.3020.0220.1270.0620.0640.0570.0630.0820.0680.0640.0690.0000.0250.0070.0040.0380.0050.0880.1000.0990.233-0.1960.0400.0580.0530.0130.4140.1190.8660.0400.0580.0670.1340.0000.0000.0220.0530.0910.0320.0500.0270.063-0.018
gvhd_proph0.0050.1110.0220.0290.0930.0330.1230.0500.0380.0490.0440.2080.0850.1730.0660.0750.3021.0000.0240.0600.3700.3540.3610.3610.3320.3570.3890.3950.4090.4260.4100.4140.3740.3460.4170.5850.3240.290-0.0120.1080.0950.0380.0220.0900.0380.2970.0490.0860.0250.0940.0230.0000.0290.0420.1270.0580.0450.0750.062-0.111
hepatic_mild0.0000.0080.0090.0190.0180.0600.0190.0250.0300.0260.0120.0270.0280.0170.0180.0420.0220.0241.0000.0240.0170.0190.0180.0170.0510.0150.0220.0180.0240.0320.0250.0230.0250.0380.0210.0210.0290.046-0.0010.0270.0000.0270.0240.0330.0180.0190.0270.0400.0160.0310.0180.0160.0050.0000.0060.0000.0210.0020.0150.023
hepatic_severe0.0000.0820.0110.0140.0140.0390.0170.0300.0230.0460.0400.0200.0910.0850.0600.0340.1270.0600.0241.000-0.006-0.002-0.003-0.0030.030-0.0050.0120.0150.0170.0180.0130.0250.0000.0100.0200.0200.0180.1010.0140.0530.0000.0180.0190.1160.0130.1320.0080.0000.0210.0500.0010.0270.0000.0080.0280.0290.0280.0300.0370.041
hla_high_res_10-0.0130.0370.0370.0170.186-0.0200.072-0.068-0.006-0.029-0.0870.432-0.0000.0500.0220.1170.0620.3700.017-0.0061.0000.9640.9750.8820.5390.8810.6090.5490.6310.6080.6330.5890.6140.5110.6360.8330.5110.244-0.0410.0970.1440.0330.0280.0770.0460.0640.0490.0360.0380.1250.0300.0210.0290.0670.1570.0370.0340.0500.050-0.158
hla_high_res_6-0.0120.0330.0400.0210.174-0.0080.066-0.065-0.011-0.022-0.0660.4010.0000.0470.0260.1150.0640.3540.019-0.0020.9641.0000.9880.8770.5250.8800.6840.5600.6520.5900.5510.5540.4950.4730.6650.8030.4970.223-0.0400.0900.1300.0310.0140.0850.0430.0660.0490.0250.0360.1190.0710.0150.0170.0610.1770.0310.0390.0400.061-0.161
hla_high_res_8-0.0110.0270.0370.0170.180-0.0140.063-0.065-0.017-0.026-0.0780.4170.0040.0470.0270.1130.0570.3610.018-0.0030.9750.9881.0000.8830.5380.8860.6450.5540.6400.6050.6480.5830.5080.4900.6400.8160.5110.237-0.0360.0900.1430.0320.0240.0780.0450.0610.0510.0310.0380.1200.0520.0160.0210.0620.1640.0390.0360.0510.055-0.167
hla_low_res_10-0.0120.0320.0320.0160.181-0.0200.072-0.071-0.010-0.026-0.0900.440-0.0080.0470.0190.1160.0630.3610.017-0.0030.8820.8770.8831.0000.7130.9760.5350.5940.5770.6490.5800.6310.5200.6200.5930.8860.5130.242-0.0480.0950.1340.0310.0260.0800.0450.0740.0480.0260.0440.1210.0310.0270.0200.0640.1670.0390.0410.0550.053-0.164
hla_low_res_60.0000.0580.0280.0210.1010.0000.1440.0530.0540.0240.0800.3090.0370.0520.0470.0800.0820.3320.0510.0300.5390.5250.5380.7131.0000.9830.5460.6800.5590.6780.5510.5560.4960.4710.5810.8950.5050.218-0.0460.0860.1230.0330.0240.0960.0380.0840.0450.0240.0370.1280.0220.0180.0240.0580.1930.0320.0460.0480.050-0.166
hla_low_res_8-0.0110.0280.0320.0160.177-0.0180.061-0.066-0.017-0.026-0.0760.420-0.0070.0460.0210.1090.0680.3570.015-0.0050.8810.8800.8860.9760.9831.0000.5400.6220.5740.6630.5760.6700.5050.4830.5860.8840.5140.232-0.0420.0860.1380.0310.0220.0770.0380.0760.0440.0250.0380.1160.0230.0160.0280.0610.1590.0410.0460.0560.051-0.170
hla_match_a_high0.0000.0760.0320.0180.0930.0080.1720.0680.0580.0120.1010.2040.0370.0210.0520.0690.0640.3890.0220.0120.6090.6840.6450.5350.5460.5401.0000.5520.4450.4580.4160.4210.3850.3630.4540.6270.5600.160-0.0320.0980.0980.0330.0110.1280.0400.0630.0390.0270.0250.1620.0160.0140.0130.0650.2450.0430.0630.0850.037-0.133
hla_match_a_low0.0000.0670.0310.0230.0950.0000.1720.0680.0600.0090.0990.2120.0420.0150.0540.0630.0690.3950.0180.0150.5490.5600.5540.5940.6800.6220.5521.0000.4520.4720.4280.4300.3910.3630.4690.6470.5690.162-0.0280.0840.1020.0330.0120.1180.0300.0700.0380.0180.0240.1580.0060.0120.0040.0620.2400.0480.0660.0940.041-0.144
hla_match_b_high0.0000.0620.0250.0160.1180.0190.1720.0540.0490.0300.1030.2740.0450.0250.0560.0600.0000.4090.0240.0170.6310.6520.6400.5770.5590.5740.4450.4521.0000.5480.5180.5200.4390.4130.5080.6950.5450.221-0.0200.0440.1170.0250.0190.1050.0350.0000.0520.0260.0270.1470.0180.0150.0070.0590.2240.0800.0840.0810.049-0.153
hla_match_b_low0.0000.0650.0220.0250.1150.0170.1810.0560.0690.0300.1010.2930.0400.0400.0610.0550.0250.4260.0320.0180.6080.5900.6050.6490.6780.6630.4580.4720.5481.0000.5340.5400.4570.4350.5350.7400.5750.227-0.0300.0570.1090.0260.0150.1140.0330.0230.0410.0230.0380.1430.0090.0110.0000.0530.2310.0520.0710.0600.053-0.161
hla_match_c_high0.0150.0690.0220.0000.1070.0280.1890.0550.0580.0320.1000.2920.0290.0310.0580.0420.0070.4100.0250.0130.6330.5510.6480.5800.5510.5760.4160.4280.5180.5341.0000.5450.4500.4370.5010.7010.5460.222-0.0210.0720.1380.0250.0210.0980.0330.0000.0410.0280.0380.1330.0160.0220.0120.0490.2370.0750.0820.0890.041-0.166
hla_match_c_low0.0040.0720.0210.0160.1140.0220.1790.0560.0630.0270.1050.2860.0400.0140.0560.0500.0040.4140.0230.0250.5890.5540.5830.6310.5560.6700.4210.4300.5200.5400.5451.0000.4470.4280.4970.7000.5520.228-0.0210.0660.1290.0250.0100.1090.0290.0000.0380.0250.0320.1310.0050.0160.0120.0510.2300.0770.0850.0860.047-0.161
hla_match_dqb1_high0.0050.0790.0140.0150.0990.0320.1710.0750.0470.0350.1070.2660.0290.0400.0500.0560.0380.3740.0250.0000.6140.4950.5080.5200.4960.5050.3850.3910.4390.4570.4500.4471.0000.4360.4700.6700.4870.191-0.0370.0880.0800.0210.0140.1180.0290.0360.0350.0230.0350.1360.0140.0050.0000.0510.2390.0370.0530.0410.035-0.119
hla_match_dqb1_low0.0140.0770.0210.0140.0980.0300.1650.0700.0540.0260.1010.2570.0360.0310.0440.0570.0050.3460.0380.0100.5110.4730.4900.6200.4710.4830.3630.3630.4130.4350.4370.4280.4361.0000.4440.6250.4680.186-0.0370.0810.0830.0180.0150.1120.0310.0100.0350.0290.0400.1230.0160.0210.0110.0490.2300.0590.0620.0580.030-0.120
hla_match_drb1_high0.0040.0770.0260.0180.1080.0190.1900.0790.0680.0230.1070.3020.0440.0740.0770.0750.0880.4170.0210.0200.6360.6650.6400.5930.5810.5860.4540.4690.5080.5350.5010.4970.4700.4441.0000.7720.5620.192-0.0520.0920.1080.0280.0130.1210.0350.0870.0400.0220.0310.1480.0270.0190.0000.0590.2560.0290.0390.0340.043-0.153
hla_match_drb1_low0.0080.1070.0380.0220.1460.0240.2610.1040.0860.0360.1430.4180.0790.0670.1000.0980.1000.5850.0210.0200.8330.8030.8160.8860.8950.8840.6270.6470.6950.7400.7010.7000.6700.6250.7721.0000.7920.176-0.0580.0810.0940.0450.0230.1660.0550.1020.0490.0270.0470.2050.0180.0160.0150.0830.3510.0520.0610.0650.032-0.153
hla_nmdp_60.0000.0660.0350.0290.1010.0260.1430.0580.0560.0300.1110.3230.0350.0530.0530.0790.0990.3240.0290.0180.5110.4970.5110.5130.5050.5140.5600.5690.5450.5750.5460.5520.4870.4680.5620.7921.0000.232-0.0440.0900.1400.0350.0160.0920.0470.1000.0480.0280.0350.1290.0280.0190.0090.0630.1930.0470.0500.0720.054-0.148
in_vivo_tcd0.0080.2860.0270.0550.0870.0980.1000.0230.0320.0480.1970.2620.2790.1640.1600.0270.2330.2900.0460.1010.2440.2230.2370.2420.2180.2320.1600.1620.2210.2270.2220.2280.1910.1860.1920.1760.2321.0000.0660.0680.0560.0100.0000.3690.0450.2370.0430.1180.0280.0660.0130.0000.0140.0420.1510.0350.0230.0480.039-0.069
karnofsky_score-0.010-0.226-0.057-0.0560.004-0.176-0.1380.061-0.017-0.107-0.039-0.0730.142-0.0950.131-0.064-0.196-0.012-0.0010.014-0.041-0.040-0.036-0.048-0.046-0.042-0.032-0.028-0.020-0.030-0.021-0.021-0.037-0.037-0.052-0.058-0.0440.0661.0000.0770.0740.0440.0000.0910.0860.2100.0370.0610.0250.0440.0000.0110.0220.0280.0340.0230.0230.0260.022-0.022
melphalan_dose0.0000.2060.0420.0240.0520.0590.4210.0750.0320.0880.1010.0300.0330.0010.0300.0160.0400.1080.0270.0530.0970.0900.0900.0950.0860.0860.0980.0840.0440.0570.0720.0660.0880.0810.0920.0810.0900.0680.0771.0000.0610.0350.0080.1120.0420.0390.0000.0160.0000.0790.0170.0270.0190.0290.2240.0890.0810.0820.000-0.072
mrd_hct0.0150.0700.0540.0140.0530.0480.0460.1390.0550.0440.0660.0800.1530.0250.0600.0120.0580.0950.0000.0000.1440.1300.1430.1340.1230.1380.0980.1020.1170.1090.1380.1290.0800.0830.1080.0940.1400.0560.0740.0611.0000.0250.0000.1590.0710.0470.0120.0300.0240.0350.0000.0090.0060.0400.0950.0480.0590.0690.030-0.054
obesity0.0070.0450.0120.0120.0180.0790.0300.0270.0210.0580.0260.0350.0390.0410.0310.0000.0530.0380.0270.0180.0330.0310.0320.0310.0330.0310.0330.0330.0250.0260.0250.0250.0210.0180.0280.0450.0350.0100.0440.0350.0251.0000.0090.0470.0210.0510.0350.0310.0420.0200.0200.0250.0190.0290.0290.0330.0370.0360.0330.029
peptic_ulcer0.0100.0130.0140.0160.0000.0200.0150.0190.0160.0060.0200.0140.0240.0250.0160.0110.0130.0220.0240.0190.0280.0140.0240.0260.0240.0220.0110.0120.0190.0150.0210.0100.0140.0150.0130.0230.0160.0000.0000.0080.0000.0091.0000.0090.0000.0020.0200.0000.0000.0260.0090.0250.0090.0000.0070.0120.0270.0160.004-0.004
prim_disease_hct0.0120.1720.0410.0770.0720.0580.0930.1920.1300.0680.0480.0840.1830.2730.0880.0910.4140.0900.0330.1160.0770.0850.0780.0800.0960.0770.1280.1180.1050.1140.0980.1090.1180.1120.1210.1660.0920.3690.0910.1120.1590.0470.0091.0000.1050.4130.0750.0910.0260.0780.0000.0390.0810.0500.0800.0870.0540.0950.053-0.022
prior_tumor0.0130.1680.0710.0610.0480.2460.0810.0280.0780.0730.0650.0220.0660.1000.0690.0500.1190.0380.0180.0130.0460.0430.0450.0450.0380.0380.0400.0300.0350.0330.0330.0290.0290.0310.0350.0550.0470.0450.0860.0420.0710.0210.0000.1051.0000.1210.0340.0620.0490.0500.0100.0210.0000.0310.0510.0250.0250.0310.000-0.001
prod_type0.0110.4880.0720.0690.0790.1160.1610.1500.1220.0930.1430.0150.4220.2230.2110.0730.8660.2970.0190.1320.0640.0660.0610.0740.0840.0760.0630.0700.0000.0230.0000.0000.0360.0100.0870.1020.1000.2370.2100.0390.0470.0510.0020.4130.1211.0000.0420.0640.0570.1380.0000.0000.0190.0590.0870.0250.0550.0190.064-0.022
psych_disturb0.0000.0610.0180.0300.0460.1050.0300.0160.0200.0220.0030.0200.0430.0660.0460.0040.0400.0490.0270.0080.0490.0490.0510.0480.0450.0440.0390.0380.0520.0410.0410.0380.0350.0350.0400.0490.0480.0430.0370.0000.0120.0350.0200.0750.0340.0421.0000.0160.0500.0390.0000.0190.0000.0500.0250.0510.0490.0550.0150.002
pulm_moderate0.0130.0970.0330.0470.0150.1930.0240.0420.0520.0810.0450.0310.0650.0700.0500.0240.0580.0860.0400.0000.0360.0250.0310.0260.0240.0250.0270.0180.0260.0230.0280.0250.0230.0290.0220.0270.0280.1180.0610.0160.0300.0310.0000.0910.0620.0640.0161.0000.0320.0370.0240.0130.0130.0430.0140.0700.0790.0690.0180.056
pulm_severe0.0000.0450.0000.0250.0250.1920.0200.0170.0160.0330.0000.0210.0210.0800.0540.0100.0670.0250.0160.0210.0380.0360.0380.0440.0370.0380.0250.0240.0270.0380.0380.0320.0350.0400.0310.0470.0350.0280.0250.0000.0240.0420.0000.0260.0490.0570.0500.0321.0000.0240.0070.0210.0040.0250.0210.0100.0130.0150.0200.019
race_group0.0090.0710.0270.0290.0710.0350.0370.0440.0460.0380.0620.1310.0570.0980.0430.0570.1340.0940.0310.0500.1250.1190.1200.1210.1280.1160.1620.1580.1470.1430.1330.1310.1360.1230.1480.2050.1290.0660.0440.0790.0350.0200.0260.0780.0500.1380.0390.0370.0241.0000.0220.0200.0470.0570.0490.0430.0440.0450.030-0.017
renal_issue0.0040.0110.0060.0150.0110.0220.0200.0130.0200.0200.0000.0070.0090.0210.0120.0090.0000.0230.0180.0010.0300.0710.0520.0310.0220.0230.0160.0060.0180.0090.0160.0050.0140.0160.0270.0180.0280.0130.0000.0170.0000.0200.0090.0000.0100.0000.0000.0240.0070.0221.0000.0000.0080.0140.0190.0000.0130.0090.0220.005
rheum_issue0.0000.0120.0220.0090.0080.0450.0220.0170.0170.0240.0000.0000.0220.0080.0210.0000.0000.0000.0160.0270.0210.0150.0160.0270.0180.0160.0140.0120.0150.0110.0220.0160.0050.0210.0190.0160.0190.0000.0110.0270.0090.0250.0250.0390.0210.0000.0190.0130.0210.0200.0001.0000.0350.0000.0190.0050.0000.0220.0140.002
rituximab0.0000.0000.0120.0000.0160.0000.0140.0180.0230.0150.0240.0270.0580.0000.0120.0120.0220.0290.0050.0000.0290.0170.0210.0200.0240.0280.0130.0040.0070.0000.0120.0120.0000.0110.0000.0150.0090.0140.0220.0190.0060.0190.0090.0810.0000.0190.0000.0130.0040.0470.0080.0351.0000.0180.0000.0310.0180.0160.013-0.019
sex_match0.0000.0470.0190.0330.0590.0260.0280.0520.0680.0420.0710.0930.0650.0850.0530.0150.0530.0420.0000.0080.0670.0610.0620.0640.0580.0610.0650.0620.0590.0530.0490.0510.0510.0490.0590.0830.0630.0420.0280.0290.0400.0290.0000.0500.0310.0590.0500.0430.0250.0570.0140.0000.0181.0000.0250.0310.0260.0420.007-0.001
tbi_status0.0100.0800.0240.0300.0580.0280.1350.0720.0340.0250.0480.1190.0310.0540.0300.0930.0910.1270.0060.0280.1570.1770.1640.1670.1930.1590.2450.2400.2240.2310.2370.2300.2390.2300.2560.3510.1930.1510.0340.2240.0950.0290.0070.0800.0510.0870.0250.0140.0210.0490.0190.0190.0000.0251.0000.0160.0250.0280.0040.059
tce_div_match0.0000.0330.0220.0220.0600.0390.0410.0440.0410.0000.0400.0580.0560.0350.0460.0540.0320.0580.0000.0290.0370.0310.0390.0390.0320.0410.0430.0480.0800.0520.0750.0770.0370.0590.0290.0520.0470.0350.0230.0890.0480.0330.0120.0870.0250.0250.0510.0700.0100.0430.0000.0050.0310.0310.0161.0000.5030.4540.0120.055
tce_imm_match0.0020.0260.0320.0090.0610.0270.0400.0350.0400.0100.0320.0710.0330.0280.0320.0610.0500.0450.0210.0280.0340.0390.0360.0410.0460.0460.0630.0660.0840.0710.0820.0850.0530.0620.0390.0610.0500.0230.0230.0810.0590.0370.0270.0540.0250.0550.0490.0790.0130.0440.0130.0000.0180.0260.0250.5031.0000.4590.0160.050
tce_match0.0090.0450.0250.0000.0700.0340.0470.0490.0420.0000.0580.0790.0470.0490.0510.0410.0270.0750.0020.0300.0500.0400.0510.0550.0480.0560.0850.0940.0810.0600.0890.0860.0410.0580.0340.0650.0720.0480.0260.0820.0690.0360.0160.0950.0310.0190.0550.0690.0150.0450.0090.0220.0160.0420.0280.4540.4591.0000.0230.007
vent_hist0.0110.0480.0000.0080.0110.0370.0210.0260.0000.0070.0250.0440.0510.0000.0110.0080.0630.0620.0150.0370.0500.0610.0550.0530.0500.0510.0370.0410.0490.0530.0410.0470.0350.0300.0430.0320.0540.0390.0220.0000.0300.0330.0040.0530.0000.0640.0150.0180.0200.0300.0220.0140.0130.0070.0040.0120.0160.0231.000-0.010
year_hct0.003-0.0020.001-0.008-0.0540.0870.0640.0690.0660.036-0.064-0.0490.075-0.111-0.128-0.065-0.018-0.1110.0230.041-0.158-0.161-0.167-0.164-0.166-0.170-0.133-0.144-0.153-0.161-0.166-0.161-0.119-0.120-0.153-0.153-0.148-0.069-0.022-0.072-0.0540.029-0.004-0.022-0.001-0.0220.0020.0560.019-0.0170.0050.002-0.019-0.0010.0590.0550.0500.007-0.0101.000

Missing values

2024-12-17T12:01:10.974290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-17T12:01:11.379075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-12-17T12:01:12.396051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

IDdri_scorepsych_disturbcyto_scorediabeteshla_match_c_highhla_high_res_8tbi_statusarrhythmiahla_low_res_6graft_typevent_histrenal_issuepulm_severeprim_disease_hcthla_high_res_6cmv_statushla_high_res_10hla_match_dqb1_hightce_imm_matchhla_nmdp_6hla_match_c_lowrituximabhla_match_drb1_lowhla_match_dqb1_lowprod_typecyto_score_detailconditioning_intensityethnicityyear_hctobesitymrd_hctin_vivo_tcdtce_matchhla_match_a_highhepatic_severedonor_ageprior_tumorhla_match_b_lowpeptic_ulcerage_at_hcthla_match_a_lowgvhd_prophrheum_issuesex_matchhla_match_b_highrace_groupcomorbidity_scorekarnofsky_scorehepatic_mildtce_div_matchdonor_relatedmelphalan_dosehla_low_res_8cardiachla_match_drb1_highpulm_moderatehla_low_res_10efsefs_time
00N/A - non-malignant indicationNoNaNNoNaNNaNNo TBINo6.0Bone marrowNoNoNoIEA6.0+/+NaN2.0NaN6.02.0No2.02.0BMNaNNaNNot Hispanic or Latino2016NoNaNYesNaN2.0NoNaNNo2.0No9.9422.0FKaloneNoM-F2.0More than one race0.090.0NoNaNUnrelatedN/A, Mel not given8.0No2.0No10.00.042.356
11IntermediateNoIntermediateNo2.08.0TBI +- Other, >cGyNo6.0Peripheral bloodNoNoNoAML6.0+/+10.02.0P/P6.02.0No2.02.0PBIntermediateMACNot Hispanic or Latino2008NoPositiveNoPermissive2.0No72.290No2.0No43.7052.0Other GVHD ProphylaxisNoF-F2.0Asian3.090.0NoPermissive mismatchedRelatedN/A, Mel not given8.0No2.0Yes10.01.04.672
22N/A - non-malignant indicationNoNaNNo2.08.0No TBINo6.0Bone marrowNoNoNoHIS6.0+/+10.02.0P/P6.02.0No2.02.0BMNaNNaNNot Hispanic or Latino2019NoNaNYesNaN2.0NoNaNNo2.0No33.9972.0Cyclophosphamide aloneNoF-M2.0More than one race0.090.0NoPermissive mismatchedRelatedN/A, Mel not given8.0No2.0No10.00.019.793
33HighNoIntermediateNo2.08.0No TBINo6.0Bone marrowNoNoNoALL6.0+/+10.02.0P/P6.02.0No2.02.0BMIntermediateMACNot Hispanic or Latino2009NoPositiveNoPermissive2.0No29.230No2.0No43.2452.0FK+ MMF +- othersNoM-M2.0White0.090.0YesPermissive mismatchedUnrelatedN/A, Mel not given8.0No2.0No10.00.0102.349
44HighNoNaNNo2.08.0No TBINo6.0Peripheral bloodNoNoNoMPN6.0+/+10.02.0NaN5.02.0No2.02.0PBNaNMACHispanic or Latino2018NoNaNYesNaN2.0No56.810No2.0No29.7402.0TDEPLETION +- otherNoM-F2.0American Indian or Alaska Native1.090.0NoPermissive mismatchedRelatedMEL8.0No2.0No10.00.016.223
55HighNoPoorYes2.07.0TBI + Cy +- OtherNo4.0Peripheral bloodNoNoNoALL5.0+/+8.01.0P/P6.01.0No1.01.0PBTBDMACHispanic or Latino2015YesNaNNoNaN2.0No27.274No1.0No32.1432.0Cyclophosphamide aloneNoF-F1.0White2.090.0NoPermissive mismatchedRelatedN/A, Mel not given5.0No2.0Yes6.01.07.095
66LowNoPoorNo2.08.0No TBINo6.0Bone marrowNoNoNoALL6.0-/+10.02.0P/P6.02.0No2.02.0BMNaNRICNot Hispanic or Latino2016NoNaNNoPermissive2.0No45.016No2.0No17.6732.0FK+ MMF +- othersNoM-M2.0More than one race1.090.0NoPermissive mismatchedUnrelatedN/A, Mel not given8.0No2.0Yes10.00.046.464
77HighNoNaNNot done2.05.0TBI + Cy +- OtherNo3.0Peripheral bloodNoNoNoIIS3.0-/-6.01.0NaN3.02.0No1.01.0BMNaNNaNNot Hispanic or Latino2018NoNaNYesNaN1.0No23.102No1.0No11.0731.0Cyclophosphamide aloneNoM-F1.0More than one race0.090.0NoNaNRelatedN/A, Mel not given5.0No1.0No6.00.018.076
88IntermediateNoOtherNoNaNNaNTBI + Cy +- OtherNo6.0Peripheral bloodNoNoNoALLNaN-/+NaN2.0NaNNaN2.0No2.02.0PBNaNMACHispanic or Latino2008NoNegativeNoNaNNaNNo36.010No2.0No35.5172.0FK+ MMF +- othersNoF-F2.0American Indian or Alaska Native3.090.0NoNaNRelatedN/A, Mel not given8.0NoNaNYes10.01.010.130
99IntermediateNoIntermediateNo2.08.0No TBINo6.0Peripheral bloodNoNoNoALL6.0NaN10.02.0G/B6.02.0No2.01.0PBIntermediateMACNot Hispanic or Latino2017NoPositiveYesPermissive2.0No55.871No2.0No37.2172.0FK+ MMF +- othersNoF-F2.0American Indian or Alaska Native1.070.0NoGvH non-permissiveUnrelatedN/A, Mel not given8.0No2.0No9.01.05.434
IDdri_scorepsych_disturbcyto_scorediabeteshla_match_c_highhla_high_res_8tbi_statusarrhythmiahla_low_res_6graft_typevent_histrenal_issuepulm_severeprim_disease_hcthla_high_res_6cmv_statushla_high_res_10hla_match_dqb1_hightce_imm_matchhla_nmdp_6hla_match_c_lowrituximabhla_match_drb1_lowhla_match_dqb1_lowprod_typecyto_score_detailconditioning_intensityethnicityyear_hctobesitymrd_hctin_vivo_tcdtce_matchhla_match_a_highhepatic_severedonor_ageprior_tumorhla_match_b_lowpeptic_ulcerage_at_hcthla_match_a_lowgvhd_prophrheum_issuesex_matchhla_match_b_highrace_groupcomorbidity_scorekarnofsky_scorehepatic_mildtce_div_matchdonor_relatedmelphalan_dosehla_low_res_8cardiachla_match_drb1_highpulm_moderatehla_low_res_10efsefs_time
2879028790IntermediateNoIntermediateNo1.04.0TBI + Cy +- OtherNo3.0Peripheral bloodNoNoNoAML3.0+/+6.02.0P/P3.01.0No1.02.0PBIntermediateMACNot Hispanic or Latino2018NoNegativeNoNaN1.0No52.643No1.0No36.1431.0Cyclophosphamide +- othersNoF-F1.0More than one race0.070.0YesPermissive mismatchedRelatedN/A, Mel not given4.0No1.0No6.01.012.940
2879128791IntermediateNoNormalNo1.04.0TBI + Cy +- OtherNo3.0Peripheral bloodNoNoNoALL3.0+/+5.01.0P/P3.01.0No1.01.0PBNaNNMANot Hispanic or Latino2008NoNegativeNoNaN1.0No38.223No1.0No59.3641.0Cyclophosphamide aloneNoM-M1.0White0.070.0NoPermissive mismatchedRelatedN/A, Mel not given4.0No1.0No5.01.03.428
2879228792N/A - non-malignant indicationNoNaNNo2.08.0No TBINo6.0Peripheral bloodNoNoNoIIS6.0+/+10.02.0G/B6.02.0No2.02.0PBNaNNaNNot Hispanic or Latino2009NoNaNYesHvG non-permissive2.0YesNaNNo2.0No17.5802.0FK+ MMF +- othersNoF-M2.0More than one race1.080.0NoHvG non-permissiveUnrelatedN/A, Mel not given8.0No2.0No10.01.06.543
2879328793N/A - non-malignant indicationNoNaNNo2.08.0No TBINo6.0Bone marrowNoNoNoIEA6.0+/+10.02.0P/P6.02.0No2.02.0BMNaNNaNNot Hispanic or Latino2016NoNaNYesPermissive2.0No33.118No2.0No13.9882.0FK+ MMF +- othersNoM-M2.0Native Hawaiian or other Pacific Islander0.0100.0NoGvH non-permissiveNaNMEL8.0No2.0No10.01.05.802
2879428794N/A - pediatricNoNaNNo2.08.0No TBINo6.0Peripheral bloodNoNoNoSAA6.0+/+10.02.0P/P6.02.0No2.02.0BMNaNRICNot Hispanic or Latino2016NoNaNYesPermissive2.0No24.417No2.0No65.2492.0FK+ MTX +- others(not MMF)NoM-F2.0Asian1.090.0NoPermissive mismatchedUnrelatedN/A, Mel not given8.0No2.0No10.01.06.279
2879528795Intermediate - TED AML case <missing cytogeneticsNaNFavorableNo2.08.0No TBINo6.0Peripheral bloodNoNoNaNALL6.0-/-10.02.0P/P6.02.0No2.02.0PBIntermediateMACNot Hispanic or Latino2018NoNegativeYesFully matched2.0No24.212Yes2.0No51.1362.0FK+ MTX +- others(not MMF)NaNM-F2.0More than one race0.0NaNNaNBi-directional non-permissiveNaNN/A, Mel not given8.0NaN2.0No10.00.018.633
2879628796HighNoPoorYes1.04.0No TBINo5.0Peripheral bloodNoNoNoAML3.0-/+6.02.0G/G4.01.0No2.02.0PBTBDRICHispanic or Latino2017NoPositiveNoNaN1.0No30.770No1.0No18.0752.0Cyclophosphamide +- othersNoM-F1.0Native Hawaiian or other Pacific Islander3.090.0NoGvH non-permissiveRelatedN/A, Mel not given6.0Yes1.0Yes8.01.04.892
2879728797TBD cytogeneticsNaNPoorNaN2.08.0No TBINaN6.0Peripheral bloodNoNaNNaNIPA6.0-/+10.02.0G/G6.02.0NaN2.02.0PBPoorMACNot Hispanic or Latino2018NoNaNNoGvH non-permissive2.0No22.627No2.0NaN51.0052.0FK+ MMF +- othersNaNM-F2.0Native Hawaiian or other Pacific Islander5.090.0NaNGvH non-permissiveUnrelatedN/A, Mel not given8.0NaN2.0No10.00.023.157
2879828798N/A - non-malignant indicationNoPoorNo1.04.0No TBINo3.0Peripheral bloodNoNaNNaNIPA3.0+/+5.01.0P/P3.01.0No1.01.0PBNaNNMANot Hispanic or Latino2018NaNNaNYesNaN1.0No58.074Yes1.0NaN0.0441.0Cyclophosphamide aloneNoM-M1.0Black or African-American1.090.0NoPermissive mismatchedRelatedMEL4.0No1.0No5.00.052.351
2879928799N/A - pediatricNoNaNNo2.08.0No TBINo6.0Bone marrowNoNoNoSAA6.0+/+10.02.0P/P6.02.0No2.02.0BMNaNNaNNot Hispanic or Latino2018NoNaNYesNaN2.0No30.571No2.0No1.0352.0Cyclophosphamide +- othersNoM-M2.0Black or African-American2.090.0NoPermissive mismatchedRelatedMEL8.0No2.0Yes10.00.025.158